DNA sequences coding for a cinnamoyl COA reductase, and applications thereof in the control of lignin contents in plants

ABSTRACT

The present invention relates to any DNA sequence comprising as a coding region all or part of the nucleotidic sequence coding for a mRNA coding coding for a cinnamoyl CoA reductase (CCR) in lucern and/or corn, or all or part of the nucleotide sequence complementary of the latter and coding for an antisense mRNA susceptible of hybridizing with said mRNA. The invention also relates to the use of said sequences for implementing processes for the regulation of lignin biosynthesis in plants.

The present invention relates to the use of DNA sequences which code fora cinnamoyl-CoA reductase (CCR) in plants, or any fragment of thesesequences, or also any sequence derived from the latter, or theircomplementary sequences, in the context of carrying out processes forregulating the level of lignin in plants.

Lignin is a complex heterogeneous aromatic polymer which rendersimpermeable and reinforces the walls of certain plants cells.

Lignin is formed by polymerization of free radicals derived frommonolignols, such as paracoumaryl, coniferyl and sinapyl alcohols(Higuchi, 1985, in Biosynthesis and degradation of wood components (T.Higuchi, ed.), Academic Press, Orlando, Fla. pp. 141-160).

Lignins have a wide variation in their relative content of monolignols,as a function of the species and the various tissues within the sameplant

This variation is probably caused and controlled by different activitiesand specificities of substrates, the enzymes necessary for biosynthesisof lignin monomers (Higuchi, 1985, loc. cit.).

Beyond its role in the structure and development of plants, ligninrepresents a major component of the terrestrial biomass and assumes amajor economic and ecological significance (Brown, 1985, J. Appl.Biochem. 7, 371-387; Whetten and Sederoff, 1991, l orest Ecology andManagement, 43, 301-316).

At the level of exploitation of the biomass, it is appropriate first tonote that lignin is a limiting factor of the digestibility andnutritional yield of fodder plants. In fact, it is clearly demonstratedthat the digestibility of fodder plants by ruminants is inverselyproportional to the content of lignin in these plants, the nature of thelignins also being a determining factor in this phenomenon (Buxton andRoussel, 1988, Crop. Sci., 28, 553-558; Jung and Vogel, 1986, J. Anim.,Sci., 62, 1703-1712).

Among the main fodder plants in which it would be of interest to reducethe lignin contents there may be mentioned: lucerne, fescue, maize,fodder used for silaging . . . .

It should also be noted that high lignin contents are partly responsiblefor the limited quality of sunflower cake intended for feeding cattle,and for the reduction in germinative capacities of certain seeds in thehorticultural sector.

It may also be emphasized that the intense lignification which resultsduring preservation of plant components after harvesting rapidly rendersproducts such as asparagus, yam, carrots etc . . . unfit forconsumption.

Furthermore, it is also appropriate to note that more than 50 milliontonnes of lignin are extracted from ligneous material each year in thecontext of production of paper pulp in the paper industry. Thisextraction operation, which is necessary to obtain cellulose, is costlyin energy and, secondly, causes pollution through the chemical compoundsused for the extraction, which are found in the environment (Dean andEriksson, 1992, Holzforschung, 46, 135-147: Whetten and Sederoff, 1991,loc. cit.).

To reduce the proportions of lignins (which make up to 20 to 30% of thedry matter, depending on the species) to a few per cent (2 to 5%) wouldrepresent an increase in yield and a substantial saving (chemicalproducts), and would contribute to improving the environment (reductionin pollution). Given the scale of use of ligneous material, thesedecreases would have extremely significant repercussions. In this case,the species concerned could be poplar, eucalyptus, Acacia magnium, thegenus Casuarina and all the angiosperms and gymnosperms used for theproduction of paper pulp.

It is clear that in the two sectors under consideration, the reductionin the levels of lignins must be moderated to preserve thecharacteristics of rigidity and the normal architecture of the plant (orthe tree), since the lignins which strengthen the cell walls play asignificant role in maintaining the erect habit of plants.

The natural variations in the lignin contents observed in nature for thesame species (deviations which can be up to 6-8% of the dry matter amongindividuals) justify the reductions suggested above.

The resistance to degradation of lignin, like the difficultiesencountered in the context of its extraction, are probably due to thecomplex structure of this polymer, which is made up of ether bonds andcarbon-carbon bonds between the monomers, as well as to the numerouschemical bonds which exist between the lignin and the other componentsof the cell wall (Sarkanen and Ludwig, 1971, in Lignins: Occurrence,Formation, Structure and Reactions (K. V. Sarkanen and C. H. Kudwig ed.)New York: Wiley—Interscience, pp. 1-18).

Starting from cinnamoyl-CoA, the biosynthesis of lignins in plants iseffected in the following manner:

An approach to attempt to reduce the level of lignins in plants bygenetic engineering would consist of inhibiting the synthesis of one ofthe enzymes in the biosynthesis chain of these lignins indicated above.

A particularly suitable technique in the context of such an approach isto use antisense mRNA which is capable of hybridizing with the mRNAwhich codes for these enzymes, and consequently to prevent, at leastpartly, the production of these enzymes from their corresponding mRNA.

Such an antisense strategy carried out with the aid of the gene whichcodes for the CAD in tobacco was the subject matter of European PatentApplication no. 584 117, which describes the use of antisense mRNA whichis capable of inhibiting the production of lignins in plants byhybridizing with the mRNA which codes for the CAD in these plants.

The results in the plants transformed in this way demonstrate areduction in the activity of the CAD, but paradoxically the contents oflignins show no change. Complementary studies indicate that the ligninsof transformed plants are different from control lignins, since thecinnamylaldehydes are incorporated directly into the lignin polymer.

One of the aims of the present invention is specifically that ofproviding a process which allows effective regulation of the contents oflignins in plants, either in the sense of a considerable reduction inthese contents with respect to the normal contents in plants, or in thesense of an increase in these contents.

Another aim of the present invention is to provide tools for carryingout such a process, and more particularly constructions which can beused for the transformation of plants.

Another aim of the present invention is to provide geneticallytransformed plants, in particular fodder plants which can be digestedbetter than non-transformed plants, or also transformed plants or treesfor the production of paper pulp, from which the extraction of ligninswould be facilitated and less polluting than in the case ofnon-transformed trees.

Another aim of the present invention is that of providing transformedplants which are more resistant to attacks from the environment, inparticular to parasitic attacks, than the non-transformed plants are, oralso transformed plants of a larger size, or of a smaller size (thanthat of the non-transformed plants).

The present invention relates to the use of recombinant nucleotidesequences containing one (or more) coding region(s), this (these) codingregion(s) being made up of a nucleotide sequence chosen from thefollowing:

the nucleotide sequence represented by SEQ ID NO 1 which codes for anmRNA, this mRNA itself coding for the cinnamoyl-CoA reductase (CCR) oflucerne represented by SEQ ID NO 2,

the nucleotide sequence represented by SEQ ID NO 3 which codes for anmRNA, this mRNA itself coding for the CCR of maize represented by SEQ IDNO 4,

a fragment of the nucleotide sequence represented by SEQ ID NO 1, or ofthat represented by SEQ ID NO 3, this fragment coding for a fragment ofthe CCR represented by SEQ ID NO 2 or for a fragment of the CCRrepresented by SEQ ID NO 3 respectively, this CCR fragment having anenzymatic activity equivalent to that of the two abovementioned CCRs,

the nucleotide sequence complementary to that represented by SEQ ID NO 1or SEQ ID NO 3, this complementary sequence coding for an antisense mRNAwhich is capable of hybridizing with the mRNA coded by the sequences SEQID NO 1 and SEQ ID NO 3 respectively,

a fragment of the nucleotide sequence complementary to that representedby SEQ ID NO 1 or SEQ ID NO 3, this sequence fragment coding for anantisense mRNA which is capable of hybridizing with the mRNA whichitself codes for the CCR represented by SEQ ID NO 2, or with the mRNAwhich itself codes for the CCR represented by SEQ ID NO 4 respectively,

the nucleotide sequence derived from the sequence represented by SEQ IDNO 1 or SEQ ID NO 3, in particular by mutation and/or addition and/orsuppression and/or substitution of one or more nucleotides, this derivedsequence coding either for an mRNA which itself codes for the CCRrepresented by SEQ ID NO 2 or SEQ ID NO 4 respectively, or for afragment or a protein derived from the latter, this fragment or derivedprotein having an enzymatic activity equivalent to that of the said CCRsin plants,

the nucleotide sequence derived from the abovementioned complementarynucleotide sequence, or from the fragment of this complementary sequenceas is described above, by mutation and/or addition and/or suppressionand/or substitution of one or more nucleotides, this derived sequencecoding for an antisense mRNA which is capable of hybridizing with one ofthe abovementioned mRNAs,

for transformation of plant cells in order to obtain transgenic plantswithin which the biosynthesis of lignins is regulated either in thesense of an increase or in the sense of a reduction in the contents oflignins produced, with respect to the normal contents of ligninsproduced in the plants, and/or in the sense of a modification of thecomposition of the lignins produced by the said transgenic plants withrespect to the lignins produced in the non-transformed plants, inparticular by carrying out one of the processes, described below, forregulation of the amount of lignin in the plants.

“Derived nucleotide sequence” in the text above and below is understoodas meaning any sequence having at least about 50% (preferably at least70%) of nucleotides homologous to those of the sequence from which it isderived.

“Derived protein” in the text above and below is understood as meaningany protein having at least about 50% (preferably at least 70%) of aminoacids homologous to those of the protein from which it is derived.

The present invention more particularly relates to any DNA sequence,characterized in that it comprises, as the coding region:

the nucleotide sequence represented by SEQ ID NO 1 which codes for anmRNA, this mRNA itself coding for the CCR represented by SEQ ID NO 2, or

a fragment of the abovementioned nucleotide sequence, this fragmentcoding for a fragment of the CCR represented by SEQ ID NO 2) NO 2, thisCCR fragment having an enzymatic activity equivalent to that of theabovementioned CCR, or

any nucleotide sequence derived from the abovementioned sequencerepresented by SEQ ID NO 1, or from a fragment, as is described above,of this sequence, in particular by mutation and/or addition and/orsuppression and/or substitution of one or more nucleotides, this derivedsequence coding for an mRNA which itself codes for the CCR representedby SEQ ID NO 2, or for a protein derived from the latter and having anenzymatic activity equivalent to that of the said CCR in plants.

The present invention more particularly relates to any DNA sequence,characterized in that it contains, as the coding region:

the nucleotide sequence represented by SEQ ID NO 3 which codes for mRNA,this mRNA itself coding for the CCR represented by SEQ ID NO 4, or

a fragment of the abovementioned nucleotide sequence, this fragmentcoding for a fragment of the CCR represented by SEQ ID NO 4, this CCRfragment having an enzymatic activity equivalent to that of theabovementioned CCR, or

any nucleotide sequence derived from the abovementioned sequencerepresented by SEQ ID NO 3, or from a fragment, as is described above,of this sequence, in particular by mutation and/or addition and/orsuppression and/or substitution of one or more nucleotides, this derivedsequence coding for an mRNA which itself codes for the CCR representedby SEQ ID NO 4, or for a protein derived from the latter and having anenzymatic activity equivalent to that of the said CCR in plants.

Protein having an enzymatic activity equivalent to that of the CCRspresent in plants, and more particularly the CCRs represented by SEQ IDNO 2 and SEQ ID NO 4, is understood as meaning any protein whichpossesses a CCR activity as measured by the method of Luderitz andGrisebach published in Eur. J. Biochem. (1981), 119:115-127.

By way of illustration, this method is carried out by spectrophotometricmeasurement of the reducing activity of the protein (CCR or derived) bymonitoring the disappearance of the cinnamoyl-CoAs at 366 nm. Thereaction takes place at 30° C. in the course of 2 to 10 minutes. Thecomposition of the reaction medium is as follows: phosphate buffer 100mM, pH 6.25, 0.1 mM NADPH, 70 μM feruloyl-CoA, 5 to 100 μl enzymaticextract in a total volume of 500 μl.

The invention also relates to any DNA sequence, characterized in that itcontains, as the coding region:

the nucleotide sequence complementary to that represented by SEQ ID NO1, this complementary sequence coding for an antisense mRNA which iscapable of hybridizing with the mRNA which itself codes for the CCRrepresented by SEQ ID NO 2, that is to say the mRNA coded by thesequence represented by SEQ ID NO 1, or coded by a sequence derived fromthe latter, as is defined above, or

a fragment of the abovementioned complementary sequence, this sequencefragment coding for an antisense mRNA which is capable of hybridizingwith the mRNA which itself codes for the CCR represented by SEQ ID NO 2,as is defined above, or

any nucleotide sequence derived from the abovementioned complementarysequence, or from the fragment of this complementary sequence as isdescribed above, in particular by mutation and/or addition and/orsuppression and/or substitution of one or more nucleotides, this derivedsequence coding for an antisense mRNA which is capable of hybridizingwith the abovementioned mRNA.

The present invention more particularly relates to any DNA sequence,characterized in that it contains, as the coding region:

the nucleotide sequence complementary to that represented by SEQ ID NO3, this complementary sequence coding for an antisense mRNA which iscapable of hybridizing with the mRNA which itself codes for the CCRrepresented by SEQ ID NO 4, that is to say the mRNA coded by thesequence represented by SEQ ID NO 3, or coded by a sequence derived fromthe latter, as is defined above, or

a fragment of the abovementioned complementary sequence, this sequencefragment coding for an antisense mRNA which is capable of hybridizingwith the mRNA which itself codes for the CCR represented by SEQ ID NO 4,as is defined above, or

any nucleotide sequence derived from the abovementioned complementarysequence, or from the fragment of this complementary sequence as isdescribed above, in particular by mutation and/or addition and/orsuppression and/or substitution of one or more nucleotides, this derivedsequence coding for an antisense mRNA which is capable of hybridizingwith the abovementioned mRNA.

It goes without saying that the sequences represented by SEQ ID NO 1 andSEQ ID NO 3, the complementary sequences, the derived sequences and thesequence fragments of the invention which are mentioned above must beconsidered as being represented in the sense 5′→3′.

The first nucleotide of a complementary sequence in the sense 5′→3′ asis described above is thus the complement of the last nucleotide of thesequence in the sense 5′→3′ which codes for a CCR (or CCR fragment orderived protein), the second nucleotide of this complementary sequenceis the complement of the last-but-one nucleotide of the sequence whichcodes for a CCR, and so on, up to the last nucleotide of the saidcomplementary sequence, which is the complement of the first nucleotideof the sequence which codes for a CCR.

The mRNA coded by the abovementioned complementary sequence is suchthat, if this mRNA is represented in the sense 5′→3′, its firstnucleotide corresponds to the last nucleotide of the sequence whichcodes for a CCR, and thus hybridizes with the last nucleotide of themRNA coded by the latter, while its last nucleotide corresponds to thefirst nucleotide of the sequence which codes for a CCR, and thushybridizes with the first nucleotide of the mRNA coded by the latter.

Antisense mRNA in the text above and below is therefore understood asmeaning any RNA coded by the said complementary sequence and representedin the reverse sense (3′→5′) to the sense in which the mRNA coded by thesequence which codes for a CCR (or CCR fragment or derived protein) isrepresented, the latter mRNA being also called sense mRNA (5′→3′).

The term antisense RNA therefore relates to an RNA sequencecomplementary to the sequence of bases of the messenger RNA, the termcomplementary being understood in the sense that each base (or amajority of the bases) of the antisense sequence (read in the sense3′→5′) is capable of pairing with the corresponding bases (G with C, Awith U) of the messenger RNA (sequence read in the sense (5′→3′).

The strategy of antisense RNAs in the context of the present inventionis a molecular approach which is particularly suitable for the aim ofmodulation of the levels of lignins in plants. The antisense RNA is anRNA produced by transcription of the non-coding DNA strand (non-sensestrand).

This antisense strategy is more particularly described in EuropeanPatent no. 240 208.

It is thought that the inhibition of the synthesis of a protein by theantisense strategy, under the circumstances the CCR in the present case,is the consequence of the formation of a duplex between the twocomplementary RNAs (sense and antisense), thus preventing the productionof the protein. The mechanism remains obscure, however. The RNA-RNAcomplex can interfere either with a subsequent transcription, or withthe maturation, transportation or translation, or even lead to adegradation of the mRNA.

A combination of these effects is also possible.

The invention also relates to any mRNA coded by a DNA sequence accordingto the invention, and more particularly:

the mRNA coded by the DNA sequence represented by SEQ ID NO 1, or codedby a fragment or a derived sequence, as are defined above, the said mRNAbeing capable of coding in its turn for the CCR present in lucerne, suchas is represented by SEQ ID NO 2, or for a fragment of this CCR or aderived protein, as are defined above,

the mRNA coded by the DNA sequence represented by SEQ ID NO 3, or codedby a fragment or a derived sequence, as are defined above, the said mRNAbeing capable of coding in its turn for the CCR present in maize, suchas is represented by SEQ ID NO 4, or for a fragment of this CCR or aderived protein, as are defined above.

The invention also relates to any antisense mRNA as defined above,characterized in that it contains nucleotides complementary to all oronly part of the nucleotides which make up an mRNA as described aboveaccording to the invention, the said antisense mRNA being capable ofhybridizing (or of pairing) with the latter.

In this respect, the invention more particularly relates to theantisense mRNAs coded by the DNA sequences according to the invention,containing at least one region of 50 bases homologous to those of aregion of the complementary sequences of the abovementioned DNAsequences of the invention.

There is no upper limit to the size of the DNA sequences which code foran antisense RNA according to the invention; they can be as long as themessenger usually produced in cells, or indeed as long as the genomicDNA sequence which codes for the mRNA of the CCR.

Such DNA sequences which code for an antisense RNA according to theinvention advantageously contain between about 100 and about 1,000 basepairs.

The invention more particularly relates to any antisense sequencecontaining one (or more) antisense mRNA(s) as described above, orfragment(s) of this (these) antisense mRNA(s), and one (or more)sequence(s) corresponding to one (or more) catalytic domain(s) of aribozyme.

In this respect, the invention more particularly relates to anyantisense sequence as is described above containing the catalytic domainof a ribozyme flanked on both sides by arms of about 8 basescomplementary to sequences which border a motif GUX (X representing C, Uor A) contained in one of the mRNAs of the invention described above(also called target RNAs) (Haseloff J., et Gerlach W. L., 1988, Nature,334:585-591).

The invention also relates to any DNA sequence which is capable ofcoding for an antisense sequence as is described above containing atleast one catalytic domain of a ribozyme bonded to one or more antisensemRNA(s) of the invention, or fragment(s) of the antisense mRNA(advantageously fragments of about 8 bases as are described above).

The invention more particularly relates to:

any antisense mRNA as is described above, characterized in that it iscoded by the nucleotide sequence complementary to that represented bySEQ ID NO 1, the said antisense mRNA being capable of hybridizing withthe mRNA coded by the DNA sequence represented by SEQ ID NO 1,

any antisense mRNA as is described above, characterized in that it iscoded by the nucleotide sequence complementary to that represented bySEQ ID NO 3, the said antisense mRNA being capable of hybridizing withthe mRNA coded by the DNA sequence represented by SEQ ID NO 3.

The invention also relates to the recombinant polypeptides coded by theDNA sequences of the invention, the said recombinant polypeptides havingan enzymatic activity equivalent to that of the CCRs in plants, and moreparticularly the recombinant CCRs coded by the sequences represented bySEQ ID NO 1 and SEQ ID NO 3, or by sequences derived from the latteraccording to the invention.

The invention more particularly relates to the recombinant polypeptides,and in particular the recombinant CCRs, such as are obtained bytransformation of plant cells by integrating into their genome, in astable manner, a recombinant nucleotide sequence as is defined belowcontaining a DNA sequence according to the invention, in particular withthe aid of a vector as is described below.

The expression “recombinant polypeptides” should be understood asmeaning any molecule which has a polypeptide chain which is capable ofbeing produced by genetic engineering, by the intermediary of atranscription phase of the DNA of the corresponding gene, leading to theobtaining of RNA which is subsequently transformed into mRNA (bysuppression of introns), the latter being then translated by theribosomes, in the form of proteins, the entire process being carried outunder the control of suitable regulatory elements inside a host cell.The expression “recombinant polypeptides” used consequently does notexclude the possibility that the said polypeptides contain othergroupings, such as glycosylated groupings.

The term “recombinant” of course indicates that the polypeptide has beenproduced by genetic engineering, since it results from expression, in asuitable cell host, of the corresponding nucleotide sequence which hasbeen introduced beforehand into an expression vector used to transformthe said cell host. However, this term “recombinant” does not excludethe possibility that the polypeptide is produced by a different process,for example by conventional chemical synthesis by the known methods usedfor synthesis of proteins, or proteolytic cleavage of molecules oflarger size.

The invention more particularly relates to the CCR such as is present inlucerne cells and is represented by SEQ ID NO 2, or the CCR such as ispresent in maize cells and is represented by SEQ ID NO 4, the said CCRsbeing those obtained in an essentially pure form, by extraction andpurification, from lucerne or maize, or any protein derived from thelatter, in particular by addition and/or suppression and/or substitutionof one or more amino acids, or any fragment resulting from the said CCRsor from their derived sequences, the said fragments and derivedsequences being capable of having an enzymatic activity equivalent tothat of the abovementioned CCRs.

The invention also relates to the nucleotide sequences which code forthe CCR represented by SEQ ID NO 2 or SEQ ID NO 4, or any derivedsequence or fragment of the latter, as are defined above, the saidnucleotide sequences being characterized in that they correspond to allor part of the sequences represented by SEQ ID NO 1 or SEQ ID NO 3respectively, or to any sequence derived from the latter by degenerationof the genetic code, and being nevertheless capable of coding for theCCRs or derived sequence or fragment of the latter, as are definedabove.

The invention also relates to the complexes formed between the antisensemRNAs as are described above and the mRNAs according to the inventionwhich are capable of coding for all or part of a CCR in plants.

The invention more particularly relates to the complex formed betweenthe mRNA coded by the sequence SEQ ID NO 1 and the antisense mRNA codedby the sequence complementary to the sequence SEQ ID NO 1, and to thecomplex formed between the mRNA coded by the sequence SEQ ID NO 3 andthe antisense mRNA coded by the sequence complementary to the sequenceSEQ ID NO 3.

The invention more particularly relates to any recombinant nucleotidesequence (or recombinant DNA), characterized in that it contains atleast one DNA sequence according to the invention chosen from thosedescribed above, the said DNA sequence being inserted into aheterologous sequence.

The invention more particularly relates to any recombinant nucleotidesequence as is described above containing, as the coding region, thenucleotide sequence represented by SEQ ID NO 1, or by SEQ ID NO 3, orany fragment or a nucleotide sequence derived from the latter, as aredefined above, the said nucleotide sequences or the said fragment beinginserted into a heterologous sequence, and being capable of coding forthe CCR represented by SEQ ID NO 2, or by SEQ ID NO 4 respectively, orfor a fragment of these CCRs, or for a protein derived from the latter,as are defined above.

The invention more particularly also relates to any recombinantnucleotide sequence containing, as the coding region, a nucleotidesequence complementary to that represented by SEQ ID NO 1, or by SEQ IDNO 3, or any fragment or any nucleotide sequence derived from thiscomplementary sequence, as defined above, the said complementarysequences or the said fragment being inserted into a heterologoussequence and being capable of coding for an antisense mRNA which iscapable of hybridizing with all or part of the mRNA which codes for aCCR in plants, and more particularly with all or part of the mRNA whichcodes for the CCR represented by SEQ ID NO 2, or by SEQ ID NO 4.

The recombinant DNAs according to the invention are furthercharacterized in that they contain the elements necessary to regulatethe expression of the nucleotide sequence which codes for a CCR, or ofits complementary sequence which codes for an antisense mRNA accordingto the invention, in particular a promoter and a terminator of thetranscription of these sequences.

Among the various promoters which can be used in the constructions ofrecombinant DNAs according to the invention there may be mentioned:

the endogenous promoter which controls the expression of the CCR in aplant, in particular the promoter situated upstream of the DNA sequencerepresented by SEQ ID NO 5 which codes, in eucalyptus, for the CCRrepresented by SEQ ID NO 6, or

promoters of a type which confers high expression, examples: ³⁵S CAMV(described in Benfey et al. (1990), EMBO J., 9 (6), 1677-1684), EF1α(promoter of the gene of an elongation factor in protein synthesis,described by Curie et al. (1991), Nucl. Acids Res., 19, 1305-1310),

promoters of a type specific for particular expression in individualtissues, examples: promoter CAD (described by Feuillet C. (1993), Thesisof the University of Toulouse III), promoter GRP 1-8 (described byKeller and Baumgartner, (1991), Plant Cell., 3, 1051-1061) forexpression in specific vascular tissues.

The invention also relates to any recombinant nucleotide sequence as isdescribed above, also containing, as the coding region, at least onenucleotide sequence which codes for all or part of an mRNA which itselfcodes for an enzyme other than the CCR, which is found to be involved ina stage of the biosynthesis of lignins in plants, in particular the mRNAwhich codes for cinnamyl alcohol dehydrogenase (CAD), or alsocontaining, as the coding region, at least one nucleotide sequence whichcodes for all or part of an antisense mRNA which is capable ofhybridizing with the abovementioned mRNA, in particular with the mRNAwhich codes for CAD.

The abovementioned recombinant nucleotide sequences of the invention areadvantageously obtained from vectors, into which are inserted the DNAsequences which code for an enzyme necessary for the biosynthesis oflignins in plants.

The abovementioned vectors are digested with the aid of suitablerestriction enzymes in order to recover the said DNA sequences which areinserted there.

The latter are then inserted downstream of a suitable promoter, andupstream of a suitable terminator of expression, within the recombinantDNAs according to the invention.

The invention more particularly relates to the recombinant DNAs whichcontain the sequence represented by SEQ ID NO 1 or that represented bySEQ ID NO 3, such as are obtained by digestion of the abovementionedvectors, recovery of the DNA sequence of the invention and insertion ofthe latter in the sense 5′→3′ within a heterologous DNA sequencecontaining a promoter and a terminator of the expression of the saidsequence.

The invention also more particularly relates to the recombinant DNAscontaining the sequence complementary to the sequence represented by SEQID NO 1 or that represented by SEQ ID NO 3, such as are obtained bydigestion of the abovementioned vectors, recovery of the DNA sequence ofthe invention and insertion of the latter in the reverse sense, that isto say in the sense 3′→5′, within a heterologous DNA sequence containinga promoter and a terminator of the expression of the complementarysequence.

By way of example of the terminator which can be used in suchconstructions there may be mentioned the 3′ end of the gene of nopalinesynthase of Agrobacterium lumefaciens.

Thus, generally, the recombinant nucleotide sequences according to theinvention containing a DNA sequence which codes for a CCR (or a fragmentof CCR or a derived protein), and/or other enzymes necessary for thebiosynthesis of lignins, are obtained by recovery of the said DNAsequence from the abovementioned vectors and insertion of this sequenceinto the heterologous sequence, while the recombinant nucleotidesequences containing a DNA sequence which codes for an antisense mRNAaccording to the invention are obtained by recovery of theabovementioned DNA sequence and insertion of the latter in the reversesense into the said heterologous sequence.

By way of illustration, all or part of the complementary DNA (cDNA)represented by SEQ ID NO 1 or SEQ ID NO 3 can be used for constructionof the abovementioned recombinant DNAs, or also all or part of thegenomic clone corresponding to a CCR (which corresponds to theabovementioned cDNAs+any introns). This genomic clone can be obtainedusing the cDNAs as probes to screen a genome bank, the latter itselfbeing obtained by the method described by Sambrook, Fritsch andManiatis, Molecular Cloning Laboratory Manual, Cold Spring HarbourLaboratory Press, 1989.

The invention also relates to any recombinant vector which can be usedfor the transformation of plants, characterized in that it contains arecombinant nucleotide sequence chosen from those described above,according to the invention, which is integrated into one of the sites ofits genome which are not essential for its replication.

Among the abovementioned recombinant vectors which can be used for thetransformation of plants, there may be mentioned: binary vectors derivedfrom pBIN 19 (Bevan et al., (1984), Nucl. Acids Res., 12 (22),8711-8721).

Examples of the construction of recombinant vectors according to theinvention are described in the detailed description of the inventionwhich follows.

The present invention also relates to a process for regulating thebiosynthesis of lignins in plants, either by reducing or by increasingthe amounts of lignins produced with respect to the normal amounts oflignins produced in these plants, the said process comprising a stage oftransformation of cells of these plants with the aid of a vectorcontaining:

the nucleotide sequence represented by SEQ ID NO 1 or by SEQ ID NO 3 ora fragment of the abovementioned nucleotide sequences, this fragmentcoding for an mRNA, this mRNA itself coding for a fragment of a CCR inplants, this CCR fragment having an enzymatic activity equivalent tothat of the CCR represented by SEQ ID NO 2 or by SEQ ID NO 4, or of anucleotide sequence derived from the abovementioned nucleotidesequences, or derived from the abovementioned fragment, in particular bymutation and/or addition and/or suppression and/or substitution of oneor more nucleotides, this derived sequence coding for an mRNA, this mRNAitself coding for a derived protein having an enzymatic activityequivalent to that of at least one of the abovementioned CCRs, or

a nucleotide sequence complementary to all or part of the nucleotidesequences represented by SEQ ID NO 1 or by SEQ ID NO 3 which code for anmRNA, or the fragment of these sequences, or the sequence derived fromthe latter, as are defined above, this complementary sequence coding foran antisense mRNA which is capable of hybridizing with one of theabovementioned mRNA,

the said transformation being carried out in particular with the aid ofa vector as is described above.

The invention more particularly relates to a process for reducing theamount of lignins produced by biosynthesis in plants, this process beingcarried out by transformation of the genome of these plants,incorporating:

at least one DNA sequence according to the invention as is describedabove which codes for an antisense mRNA which is capable of hybridizingwith all or part of the mRNA which codes for the CCR represented by SEQID NO 2 or SEQ ID NO 4, or for a protein derived from the latter as isdefined above,

and, where appropriate, at least one DNA sequence which codes for anantisense mRNA which is capable of hybridizing with an mRNA which codesfor an enzyme other than the CCR, which is found to be involved in astage of the biosynthesis of lignins in plants, in particular the mRNawhich codes for CAD,

the said transformation being carried out:

either with the aid of a recombinant vector as is described above,containing a DNA sequence which codes for an antisense mRNA which iscapable of hybridizing with the mRNA which codes for the CCR or for aderived protein, as is defined above, and, where appropriate, containingone or more DNA sequence(s) which code(s) for an antisense mRNA which iscapable of hybridizing with an mRNA which codes for an enzyme other thanthe CCR as is defined above,

or with the aid of several recombinant vectors, at least one of whichcontains a DNA sequence which codes for an antisense mRNA which iscapable of hybridizing with the mRNA which codes for the CCR or for aderived proteins, as is defined above, while the other recombinantvector(s) contain(s) a DNA sequence which codes for an antisense mRNAwhich is capable of hybridizing with an mRNA which codes for an enzymeother than the CCR, as is defined above.

Another process for reducing the amount of lignins produced bybiosynthesis in plants is that realized by transformation of the genomeof these plants, incorporating:

at least one DNA sequence according to the invention represented by SEQID NO 1 or SEQ ID NO 3, or a fragment or a sequence derived from thelatter, as are defined above,

and, where appropriate, at least one DNA sequence which codes for all orpart of an enzyme other than the CCR, which is found to be involved in astage of the biosynthesis of lignins in plants, in particular a DNAsequence which codes for all or part of CAD,

the said transformation being realized:

either with the aid of a recombinant vector as is described abovecontaining the abovementioned DNA sequence according to the invention,or a fragment or a sequence derived from the latter, as are definedabove, and, where appropriate, containing one or more DNA sequence(s)which code(s) for all or part of an enzyme other than the CCR, as isdefined above,

or with the aid of several recombinant vectors, at least one of whichcontains an abovementioned DNA sequence according to the invention, or afragment or a sequence derived from the latter, as are defined above,while the other recombinant vector(s) contain(s) a DNA sequence whichcodes for all or part of an enzyme other than the CCR, as is definedabove.

The latter method makes use of the co-suppression mechanism.Co-suppression has been observed when copies of the endogenous gene havebeen introduced into the genome. Although the mechanism ofco-suppression is currently unknown, one of the most frequent hypothesesadopted is that negative regulation of the expression of the gene wouldcome from production of a small proportion of antisense RNA derived froma transgene through reading of the “bad” strand of the transgene(Grierson et al., Trends Biotech., 9: 122-123).

The invention also relates to a process for reducing the amount oflignins produced by biosynthesis in plants, this process being carriedout by transformation of the genome of these plants incorporating a DNAsequence as is described above according to the invention, which codesfor an antisense sequence containing one (or more) catalytic domain(s)of a ribozyme bonded to one (or more) antisense mRNA(s), or fragment(s)of the antisense mRNA of the invention, the said transformation beingcarried out with the aid of a recombinant vector containing arecombinant nucleotide sequence according to the invention, itselfcontaining the abovementioned DNA sequence.

It is important to note that the abovementioned methods allowtransformed plants which have different levels of reduction of the CCRactivity (depending on the level of insertion of the DNA sequence whichcodes for the antisense mRNA, the number of copies of this DNA sequenceintegrated into the genome . . . ), and therefore of lignin contents, tobe arrived at.

The choice of transformants will therefore allow controlled modulationof the contents of lignins compatible with a normal development of theplant.

Generally, considering that the normal average content of lignins of aplant varies between about 15% and about 35% by weight of dry matter,the reduction in the content of lignins resulting from carrying out oneof the abovementioned processes is advantageously such that the plantsthus transformed have an average content of lignins which varies betweenabout 10% and about 30%, or even between about 12% and about 32%.

By way of illustration, the content of lignins in a plant can bemeasured by a variant of the method of Johnson et al., (1961),T.A.P.P.I., 44, 793-798, which is described in detail in Alibert andBoudet (1979), Physiol., Veg., 17 (1), 67-74, the main stages of whichare the following: after obtaining a powder of benzene alcoholcontaining lignins of plant material, the lignins are solubilized withacetyl bromide and analysed as a function of their absorption inultraviolet light.

The invention more particularly relates to the use of the abovementionedprocesses for reducing the contents of lignins in plants to obtaingenetically transformed fodder plants which have reduced lignin contentswith respect to the normal contents of lignins in these plants, thedigestibility of which is therefore found to be improved with respect tothese same non-transformed plants.

Among the main fodder plants which can be transformed in the context ofthe present invention there may be mentioned: lucerne, fescue, maize forsilaging etc . . . .

The invention also relates to the use of the abovementioned processesfor reducing the contents of lignins in plants to obtain geneticallytransformed plants, and more particularly trees, having reduced lignincontents with respect to the normal contents of lignins in these plants,these plants or trees being particularly advantageous for use in thecontext of the production of paper pulp.

A third potential field of application of the abovementioned processesfor negative regulation of the expression of the gene of the CCR relatesto stimulation of the growth of the transformed plants. Variousarguments (Sauter and Kende, 1992, Plant and Cell Physiology, 33(8):1089) emphasize that early and rapid lignification is a restraint oncell enlargement and thus on the growth of plants. The use of theabovementioned processes is thus capable of allowing better growth andtherefore better yields for the plants with reduced lignificationtransformed in this way.

The invention also relates to a process for increasing the amount oflignins produced by biosynthesis in plants, this process being carriedout by transformation of the genome of these plants, incorporating:

at least one DNA sequence according to the invention represented by SEQID NO 1 or SEQ ID NO 3, or a fragment or a sequence derived from thelatter, as are defined above,

and, where appropriate, at least one DNA sequence which codes for all orpart of an enzyme other than the CCR, which is found to be involved in astage of the biosynthesis of lignins in plants, in particular a DNAsequence which codes for all or part of CAD,

the said transformation being carried out:

either with the aid of a recombinant vector as is described abovecontaining the abovementioned DNA sequence according to the invention,or a fragment or a sequence derived from the latter, as are definedabove, and, where appropriate, containing one or more DNA sequence(s)which code(s) for all or part of an enzyme other than the CCR, as isdefined above,

or with the aid of several recombinant vectors, at least one of whichcontains an abovementioned DNA sequence according to the invention, or afragment or a sequence derived from the latter, as are defined above,while the other recombinant vector(s) contain(s) a DNA sequence whichcodes for all or part of an enzyme other than the CCR, as is definedabove.

Generally, always considering that the normal average content of ligninsin a plant varies between about 15% and about 35% by weight of drymatter, the increase in the content of lignins resulting from carryingout the abovementioned process is advantageously such that the plantsthus transformed have an average content of lignins which varies betweenabout 20% and about 40%, or even between about 18% and about 38%.

The invention more particularly relates to the use of the abovementionedprocess for increasing the content of lignins in plants (also called aprocess for over-expression of the gene of the CCR) to obtaingenetically transformed plants having increased lignin contents withrespect to the normal contents of lignins in these plants, of which theresistance properties to environmental attacks, in particular toparasitic attacks, are thus found to be improved with respect to thesesame non-transformed plants. In the latter case, it is particularlyadvantageous to use, in combination with the CCR gene, or a derivedsequence, in the abovementioned vectors, specific promoters which areexpressed particularly in the tissues of the surface and/or in responseto wounding.

Furthermore, the invention also relates to the use of the abovementionedprocess of over-cxpression of the gene of the CCR for improving thegrowth of the plants genetically transformed in this way, in particularin certain sectors, such as horticulture or arboriculture, where it isdesirable to obtain plants of reduced size.

Finally, the benzene rings of lignin have a higher intrinsic energy thanthe aliphatic chains of the glucose residues of cellulose. The increaseby the abovementioned process of the invention in the proportion oflignin in plants used as fuels thus allows an improvement in the energypotential of these combustible plants.

In the two cases of negative regulation or of over-expression of theCCR, it is entirely foreseeable that the modulation of this activity hasrepercussions on the content of lignins in the transformed plants. Infact, the CCR, of which the level of activity is very low in the plant,seems to be the regulatory enzyme of lignin synthesis.

Regarding the transformation techniques used to carry out one of theprocesses of the invention which are described above, the followingtechniques will advantageously be used:

A) The technology of transformation by the intermediary of the plasmidTi of Agrobacterium tumefaciens described by Bevan (1984) Nucleic AcidResearch, 12, 8711-8721. This essentially makes use of the method ofco-culture, and involves a co-transformation with a selection gene to beable to identify the transformants.

It is particularly applicable to dicotyledons, e.g.: tobacco, lucerne,rape.

B) The technique of direct transfer of genes by biolistics described indetail by (Zumbrum et al., 1989, Technique 1, 204-216; Sanford et al.,1991, Technique 3, 3-16).

This technique involves combination of the recombinant DNA according tothe invention with microparticles of gold or tungsten, which arepropelled with the aid of a particle gun onto the tissue to betransformed. It will be applied in particular to the transformation ofspecies which are unaffected by Agrobacteria.

In the two abovementioned cases, verification of the presence of therecombinant DNA according to the invention will be carried out byhybridization experiments of the Southern type and gene amplification(polymerase chain reaction) with the aid of probes and oligonucleotideprimers resulting from, in particular, the sequence SEQ ID NO 1 or SEQID NO 3.

The invention also relates to the cells of plants transformed by avector according to the invention, in particular by the techniquesdescribed above, and containing a DNA sequence according to theinvention integrated in their genome in a stable manner.

The invention also relates to the transformed plants such as areobtained by culture of the abovementioned transformed cells.

The transformed plants can then be propagated sexually or vegetativelyin vitro or in natura.

The invention also relates to the fragments of plants, in particularfruits, seeds or pollen, transformed by incorporation into their genomeof a DNA sequence according to the invention with the aid of theabovementioned recombinant vectors.

The invention also relates to antibodies directed against therecombinant polypeptides of the invention, and more particularly thosedirected against the abovementioned recombinant CCRs.

Such antibodies can be obtained by immunization of an animal with thesepolypeptides, followed by recovery of the antibodies formed.

It goes without saying that this production is not limited to polyclonalantibodies.

It is also applied to any monoclonal antibody produced by any hybridomawhich is capable of being formed by conventional methods from thesplenic cells of an animal, in particular the mouse or rat, immunizedagainst one of the purified polypeptides of the invention on the onehand and cells of a suitable myeloma on the other hand, and of beingselected by its capacity to produce monoclonal antibodies whichrecognize the abovementioned polypeptide initially used for immunizationof the animals.

The invention also relates to the use of the abovementioned antibodiesdirected against the recombinant polypeptides of the invention forcarrying out a method for detection or analysis of the CCRs in plantsfrom samples taken from the latter.

It is appropriate to state that the nucleotide sequences represented bySEQ ID NO 5, SEQ ID NO 7, SEQ ID NO 9 and SEQ ID NO 11 which code,respectively, for the CCR of eucalyptus represented by SEQ ID NO 6, theCCR of poplar represented by SEQ ID NO 8, the CCR of fescue representedby SEQ ID NO 10 and the CCR of tobacco represented by SEQ ID NO 12, aswell as the sequence represented by SEQ ID NO 13 which codes for theprotein represented by SEQ ID NO 14 derived from the abovementioned CCRof eucalyptus, are excluded from the nucleotide sequences of theinvention and their abovementioned use.

Furthermore, the sequences complementary to the nucleotide sequences SEQID NO 5, SEQ ID NO 7, SEQ ID NO 9, SEQ ID NO 11 and SEQ ID NO 13, andalso the fragments or sequences derived from these nucleotide sequencesor from their complementary sequences, inasmuch as these fragments andderived sequences are identical to fragments and derived sequences asare defined above, nucleotide sequences represented by SEQ ID NO 1 andSEQ ID NO 3 or their complementary sequences, are excluded from thenucleotide sequences of the invention and their abovementioned use.

The following are also excluded from the context of the presentinvention:

the mRNAs coded by the DNA sequences represented by SEQ ID NO 5, SEQ IDNO 7, SEQ ID NO 9, SEQ ID NO 11 and SEQ ID NO 13, or coded by a fragmentor a sequence derived from these DNA sequences, inasmuch as thisfragment or derived sequence are identical to fragments and derivedsequences, as are defined above, of the sequences represented by SEQ IDNO 1 and SEQ ID NO 3,

the antisense mRNAs made up of nucleotides complementary to theabovementioned mRNAs,

the polypeptides represented by SEQ ID NO 6, SEQ ID NO 8, SEQ ID NO 10,SEQ ID NO 12 and SEQ ID NO 14, and also any fragment or sequence derivedfrom the abovementioned polypeptides, inasmuch as this fragment orderived sequence are identical to the fragments and derived sequences,as are defined above, of the polypeptide sequences represented by SEQ IDNO 2 and SEQ ID NO 4.

Ihe invention will be detailed further in the description which followsfor the preparation of the CCR in purified form in eucalyptus, and ofthe cDNA which codes for the CCR of eucalyptus, lucerne and maize.

A) Preparation of the Purified CCR of Eucalyptus and of the cDNA WhichCodes for a CCR of Eucalyptus 1. Purification of the CCR of Eucalyptus

The CCR has been the subject of a very limited number of studies. Amongthe few publications relating to it, there may be mentioned:

Wengenmayer H., Ebel J., Grisebach H., 1976—Enzymatic synthesis oflignin precursors, purification and properties of a cinnamoyl-CoA: NADPHreductase from cell suspension cultures from soybean (Glycine max), Eur.J. Biochem., 65, 529-536.

Luderitz T., Grisebach H., 1981—Enzymatic synthesis of ligninprecursors, comparison of cinnamoyl: CoA reductase and cinnamyl alcoholdehydrogenase: NADP dehydrogenase from spruce (Picea abies L.) andsoybean (Glycine max L.), Eur. J. Biochem., 119: 115-127.

Sarni F., Grand C., Boudet A. M., 1984—Purification and properties ofcinnamoyl-CoA reductase and cinnamyl alcohol dehydrogenase from poplarstems (Populus x euramericana). Eur. J. Biochem., 139: 259-265.

The work described below has contributed to the definition of anoriginal, simple and rapid protocol for the purification of the CCR ofeucalyptus. This protocol is also more effective than those describedpreviously in the literature. In fact, it has allowed, for the firsttime, the preparation of enzyme purified to homogeneity in amountssufficient to obtain internal peptide sequences and to lead in time tocloning of the corresponding cDNA.

All the stages of purification of the CCR were carried out at 4° C.

1. Preparation of a Crude Extract of the Xylem of Eucalyptus.

The plant material was obtained by “scraping” a xylem-enriched tissuefraction from branches of Eucalyptus gunnii aged 5 years.

300 g xylem, frozen beforehand in liquid nitrogen, were reduced to apowder with the aid of a coffee grinder. The ground material thusobtained was homogenized in one litre of extraction buffer (100 mMTris-HCl pH 7.6, 2% PEG 6000, 5 mM DTT, 2% PVPP), filtered over twolayers of Miracloth, and brought to 30% saturation in ammonium sulphate.After centrifugation at 15,000×g for 30 minutes, the sediment obtainedis resuspended in 60 ml of buffer 1 [20 mM Tris-HCl p7.5, 5 mM DTT(dithiothreitol), 5% ethylene glycol]. Ihe extract thus obtained is“clarified” by a centrifugation at 10,000×g for 15 min, and thendesalinated by passage over Sephadex G25 equilibrated in buffer 1.

2. Affinity Chromatography on Red Sepharose.

The crude desalinated extract is deposited on a “Red Sepharose” affinitycolumn (1.5×19 cm, Pharmacia), equilibrated in buffer 1. After a firstrinsing of the column with 50 ml buffer 1, the proteins are eluted by alinear gradient of Tris from 20 mM to 1.5 M Tris-HCl pH 7.5, containing5 mM DDT and 5% ethylene glycol. The total volume of the gradient is 200ml and the flow rate is 36 ml/h. The fractions having a CCR activity arecombined and desalinated by passage over a Sephadex G25 columnequilibrated in buffer 1.

3. Anion Exchange Chromatography on MonoQ.

The fractions thus combined and desalinated are chromatographed over aMonoQ anion exchange column (HR 5/5, Pharmacia). Elution of the proteinsis carried out by application of a linear gradient from 20 to 300 mM ofTris-HCl pH 7.5 containing 5% ethylene glycol and 5 mM DTT. The totalvolume of the gradient is 50 ml and the flow rate is 1 ml/min. As in thepreceding stage, the fractions containing the active CCR enzyme arecombined and desalinated, but in this case the equilibration buffer ofthe Sephadex G25 columns is a 20 mM phosphate buffer pH 7.6 containing 5mM DTT (buffer 2).

4. Affinity Chromatography on “Mimetic Red”

The group of CCR fractions thus obtained is deposited on a Mimetic Red 2A6XL column (ACL, Cambridge). The column is washed beforehand with 30 mlbuffer 2 containing 8 mM NAD. The aim of this washing is to eliminateenzymes which function specifically with NAD as a cofactor, such asmalate dehydrogenase, which is copurified with the CCR in the precedingstages. Specific elution of the CCR is obtained by application of anNADP gradient (15 ml) of 0-8 mM in buffer 2. The fractions containingthe pure and active CCR are stored at −80° C., after addition of astabilizer (ethylene glycol to a final concentration of 5%).

The purified enzyme thus obtained has a specific activity of 451 nKat/mgprotein, using feruloyl-CoA as the substrate. The yield obtained (36 μgpure protein per 300 g plant starting material) does not reflect theproportion of CCR in planta, and in fact in a major effort to eliminatethe maximum contamination at each purification stage, only the fractionshaving a very high CCR activity are treated in the following stage. Thepurification factor obtained by this protocol is 282.

II Characterization of the CCR

The CCR of eucalyptus is a monomer of 38 kD, as demonstrated byconvergent results obtained for the size of the native enzyme byexclusion chromatography over Superose 6 (Pharmacia) and for the size ofthe monomer sub-unit on denaturing electrophoresis gel. The isoelectricpoint, estimated by chromatography over MonoP (Pharmacia) is close to 7.

Investigation of the optimum pH and buffer shows that measurement of theCCR activity as was initially described (Luderitz and Grisebach, 1981)is excellently suitable for measurement of the CCR activity ofeucalyptus (100 mM phosphate buffer, pH 6.25).

The purity of the CCR present in the state of a single band onmonodimensional electrophoresis gel (SDS PAGE) was confirmed by a singlespot being obtained after bidimensional electrophoresis and stainingwith silver.

III Preparation of the cDNA Which Codes for the CCR of Eucalyptus

In order to avoid any problem of undetectable residual contamination,the pure enzyme was subjected to preparative electrophoresis undersemi-denaturing conditions and digested in situ in the gel. Thedigestion was carried out with the aid of endolysine C, which cutsproteins specifically after lysine residues, allowing the preparation ofrelatively long peptides. The peptides resulting from the digestion wereseparated by reverse phase HPLC, and some of them were sequenced withthe aid of a protein microsequencer (Applied Biosystems 470). Thesequences of these internal peptides are shown below:

peptide 8 (a) (SEQ ID NO: 15) Asn-Trp-Tyr-Cys-Tyr-Gly-Lys

(b) (SEQ ID NO: 16) His-Leu-Pro-Val-Pro-X-Pro-Pro-Glu-Asp-Ser-Val-Arg

X representing any amino acid

peptide 10 (SEQ ID NO: 17)Thr-Tyr-Ala-Asn-Ser-Val-Gln-Ala-Tyr-Val-His-Val-Lys

peptide 13 (SEQ ID NO: 18)Gly-Cys-Asp-Gly-Val-Val-His-Thr-Ala-Ser-Pro-Val-Thr-Asp-Asp

peptide 17 (SEQ ID NO: 19)Leu-Arg-Asp-Leu-Gly-Leu-Glu-Phe-Thr-Pro-Val-Lys

peptide 18 (SEQ ID NO: 20)Gly-Asp-Leu-Met-Asp-Tyr-Gly-Ser-Leu-Glu-Glu-Ala-Ile-Lys

The cDNA which codes for the CCR was obtained by screening, with the aidof oligonucleotides of a cDNA bank constructed in the phage λ ZAPII(commercially available vector, Stratagène) from messengers extractedfrom the xylem of Eucalyptus gunnii. 600,000 phages were screened withthe aid of a group of degenerated oligonucleotides marked at the 3′ endwith ³²phosphorus with the aid of a terminal transferase. Theoligonucleotide sequences used for the screening were determined fromthe abovementioned internal peptide sequences. Since these peptides weregenerated by cutting with endolysine C, a lysine was added in the firstposition to allow production of oligonucleotides of less degeneration.In fact, this amino acid can be coded only by two codons, forms part ofthe amino acids of which the code is degenerated less, and consequentlyis entirely suitable for preparation of oligonucleotides from peptidesequences.

The oligonucleotide sequences used for screening the cDNA bank ofeucalyptus which are derived from the amino acids underlined (I=inosine)are indicated below:

peptide 8 (a) (SEQ ID NO: 21) Lys-Asn-Trp-Tyr-Cys-Tyr-Gly-Lys

poligonucleotide 8 (SEQ ID NO: 22)AA(A/G)AA(C/T)TGGTA(C/T)TG(C/T)TA(T/C)GGIAA

peptide 13 (SEQ ID NO: 23)Lys-Gly-Cys-Asp-Gly-Val-Val-His-Thr-Ala-Ser-Pro-Val-Thr-Asp-Asp

oligonucleotide 13 (SEQ ID NO: 24) AA(G/A)GGITG(C/T)GA(C/T)GGIGTIGTICA

peptide 17 (SEQ ID NO: 25)Lys-Leu-Arg-Asp-Leu-Gly-Leu-Glu-Phe-Thr-Pro-Val-Lys

oligonucleotide 17 (SEQ ID NO: 26) GA/(G/A)TT(C/T)ACICCIGTIAA

peptide 18 (SEQ ID NO: 27)Lys-Gly-Asp-Leu-Met-Asp-Tyr-Gly-Ser-Leu-Glu-Glu-Ala-Ile-Lys

oligonucleotide 8 (SEQ ID NO: 28)AA(G/A)GGIGA(C/T)(C/T)TIATGGA(C/T)TA(C/T)GG

The hybridization conditions used for the screening are as follows: theprehybridization is carried out for 6 to 7 hours in 5×SSPE, 0.25%skimmed milk powder and 0.05% SDS (sodium dodecyl sulphate) at 42° C.The hybridization is carried out in this same solution in the presenceof 4 oligonucleotides marked at 3′ by ddATPα³²P, for 24 hours at 42° C.At the end of these 24 hours of hybridization, the filters are washedthree times for 15 minutes in 2×SSC and 0.1% SDS and then brought intocontact with an autoradiography film for 24 hours at −80° C. The phageswhich hybridize with the group of oligonucleotides were purified by 2supplementary screening cycles (“plate purification”). Once purified,the six positive clones were tested with each of the oligonucleotidestaken independently. One phage reacted positively with the 4oligonucleotides, and was treated in order to “excise” the recombinantBluescript plasmid following the manufacturer's instructions(Stratagène). The restriction map of the insert (coding for the CCR)contained in this plasmid is shown in diagram form on FIG. 1.

IV Characterization and Identification of the cDNA of the CCR

The amino acid sequences (represented by SEQ ID NO 6) deduced from thenucleotide sequence (represented by SEQ ID NO 5) codes for a protein of335 amino acids, the molecular weight of which is 36.5 kD and theisoelectric point of which is about 5.8. It is important to emphasizethat all the peptide sequences obtained from the purified CCR are foundin the peptide sequence deduced from the nucleotide sequence of thecDNA.

Investigations of homologies with already-existing clones were carriedout using the BLAST and FASTA programs in all the available protein andnucleic banks. A significant homology was found with another reductaseof the metabolism of phenolic compounds, dihydroflavonol reductase(DFR). The identity is about 40% and the similarity approaches 20%between the peptide sequence deduced from the cDNA of the CCR and thesequences of the various dihydroflavonol reductases catalogued in thebanks, which confirms that the clone identified differs from a clonewhich codes for a DFR.

V Production of Active Recombinant CCR in E. coli

For all further work in the identification of the cDNA of the CCR, therecombinant protein was produced in E. coli and its enzymatic activitywas investigated. The experimental details of this approach aredescribed below.

1—Introduction of the cDNA into the Expression Vector pT7-7.

In order to be able to clone the cDNA in the expression vector pr7-7(commercially available) under the control of the promoter of T7polymerase, we had to introduce an NdeI site at the ATG of the cDNA.This was carried out with the aid of a Taq polymerase during a reactionfor gene amplification by PCR (polymerase chain reaction) between amuted oligonucleotide and a commercial primer, T7, situated on theBluescript downstream of the 3′ end of the cDNA. The amplificationproduct obtained is digested by KpnI, this site is then repaired withthe aid of the Klenow fragment of DNA polymerase I before the fragmentis subjected to digestion by NdeI, and the fragment obtained, containingan NdeI site at 5′ and a free end at 3′, is then inserted with the aidof a T4 DNA ligase into the vector PT7-7, which has been openedbeforehand by NdeI and SmaI.

The sequence of the abovementioned muted oligonucleotide is indicatedbelow.

The bases underlined and in italics were modified with respect to theinitial sequence, allowing the creation of an NdeI site (CATATG):

5′GGCAATCCCCATATGCCCGTCGACGC3 (SEQ ID NO: 29)

2. Over-expression of the CCR in E. coli BL21

The construction thus obtained is introduced into the strain E. coliBL21 (commercially available), which carries on its chromosome the geneof T7 polymerase under control of the promoter lac UV5, a promoter whichcan be induced by IPTG. The recombinant culture is cultured at 37° C.until the OD measured is 1 at 600 nm, and the production of the CCR isthen induced by addition of IPTG (0.25% finally) to the culture medium.Samples are taken at various times after the induction, and the cellsare lysed by the protocol described by Grima-Pettenati et al. (1993).After centrifugation, the supernatant containing the soluble proteins isused to measure the CCR activity and to visualize the production of CCR,after electrophoresis under denaturing conditions. The appearance of apolypeptide of about 38 kD, the intensity of which increases with thepost-induction time and which does not exist in negative controls(strain BL21 containing only the vector pT7-7 without the insert), isfound. Furthermore, the final proof of the identity of the CCR clone isprovided by measurement of a CCR activity (about 7 nKat/ml culture afterinduction for 3 h at 37° C.) in the protein extracts originating fromstrains of BL21 containing only pT7-7+cDNA CCR.

The vector called pEUCCR (shown on FIG. 2), containing the sequencerepresented by SEQ ID NO 5 cloned in the Bluescript vector, has beendeposited in culture in cells of E. coli DH5α at the CollectionNationale de Culture de Micro-organismes [National Culture Collection ofMicro-organisms] (CNCM) of the Institut Pasteur in Paris (France) onMarch 17th, 1994 under no. I-1405.

LEGEND TO THE FIGURES

FIG. 1: Restriction map of the cDNA which codes for the CCR ofeucalyptus.

FIG. 2: Schematic representation of the plasmid pEUCCR containing thesequence represented by SEQ ID NO 5 (and identified by CCR in theplasmid pEUCCR).

FIG. 3: Schematic representation of the construction of a vectorcontaining a DNA sequence which codes for the CCR of eucalyptusaccording to the invention (or sense CCR vector).

FIG. 4: Schematic representation of the construction of a vectorcontaining a DNA sequence which codes for an antisense RNA which iscapable of hybridizing with the mRNA which codes for the CCR ofeucalyptus according to the invention (or antisense CCR vector).

REGARDING THE DNA SOURCE SERVING FOR THE CONSTRUCTION OF AN ANTISENSE(OR SENSE) VECTOR

The antisense RNA is preferentially derived from the sequence containedin the clone pEUCCR. This sequence can be obtained in various ways:

1) by cutting the DNA (cDNA) sequence of the CCR contained in pEUCCRwith suitable restriction enzymes,

2) by performing a gene amplification (PCR) with the aid of definedoligonucleotides such that the desired DNA fragment is synthesized.

The DNA fragment thus obtained is cloned in an expression vector ofplants downstream of a promoter and upstream of a terminator. Thecloning is carried out such that the DNA fragment is inserted in reverseorientation with respect to the promoter. In this new vector, the strandwhich was initially the matrix strand becomes the coding strand and viceversa.

The new vector codes for an RNA, the sequence of which is complementaryto the messenger RNA sequence deduced from the sequence contained inpEUCCR.

The 2 RNAs are thus complementary by their sequence and also by theirorientation (5′-3′).

As the source of DNA for transcription of the antisense RNA, it isappropriate to use a cDNA clone such as that contained in pEUCCR.

Example of Antisense Cloning (cf. FIG. 4)

The cDNA of the CCR is obtained by a double digestion (BamHI and KpnI)from the vector pEUCCR. The DNA fragment thus liberated is separatedphysically from the cloning vector by an agarose gel electrophoresis(Bluescript).

The part of the gel containing this DNA fragment is cut out and treatedto obtain the purified DNA (several methods can be used, including“low-melting agarose” described in Sambrook et al. loc. cit., and GeneClean, the kit of which is commercially available).

The fragment which carries the BamHI and KpnI ends is “ligated” with anexpression vector of plants which has been digested beforehand by thesesame enzymes, chosen such that the cDNA is inserted in reverseorientation with respect to the promoter ³⁵S. The strand which will betranscribed in the plants will in this case be the non-coding strand.

Example of Sense Cloning (cf. FIG. 3)

In this case, there are no “practical” restriction sites for realizingtranslational fusion with the promoter ³⁵S of the expression vector.More convenient new sites were inserted with the aid of the geneamplification technique (PCR). Two oligonucleotides have been defined at5′ and 3′ of the cDNA, to which have been added the sequences of sitesrecognized by KpnI and BamHI (NB: these are the same sites as have beenused for the abovementioned antisense cloning, but are positioneddifferently with respect to the 5′-3′ orientation).

Gene amplification leads to a fragment containing all the codingsequence of the cDNA flanked by 2 restriction sites being obtained. Thesubsequent procedure is identical to that described for the antisenseconstruction.

In this case, however, a fusion of the promoter in phase with the ATG ofthe CCR has been realized, which must lead to an over-expression of themessenger RNA and therefore of the protein CCR.

The examples of cloning of sense and antisense sequences described abovein the case of the CCR of eucalyptus can also be applied in the case ofthe CCR of lucerne and that of maize.

B) Preparation of the cDNA Which Codes for the CCR of Lucerne (Medicagotruncatula)

Characteristics of the cDNA Bank:

The bank used was constructed from total RNA extracts of roots ofMedicago truncatula in the vector λZAPH (“ZAP-cDNA synthesis” kit fromStratagène).

Screening of the cDNA Bank:

Probe:

Screening of the lucerne bank was carried out with the aid of the cDNAwhich codes for the CCR of eucalyptus. A fragment of 800 bp (Xho-Xho) ofpEUCCR marked by the technique of random priming was used as the probe.

Display of the Bank and Imprints on Nitrocellulose Filter:

300,000 clones were displayed and then transferred to the nitrocellulosefilter (Schleicher & Schuell). For this, the filters were placed on theculture boxes for 5 min and then immersed successively in the followingsolutions:

1.5M NaCl/0.5M NaOH 5 min 1.5M NaCl/0.5M Tris pH 8 5 min 3 × SSC 2 min

heating for 2 hours at 80° C.

Prehybridization-hybridization:

The filters were prehybridized for 12 hours and then hybridized for 24hours at 37° C. in the following medium:

Prehybridization and hybridization medium:

formamide 20%

dextran 10%

NaCl 1 M

DNA of salmon sperm (1 mg/ml)

0.2% polyvinylpyrrolidone

0.2% BSA

0.2% ficoll

0.05 M Tris-HCl pH 7.5

0.1% sodium pyrophosphate

1% SDS.

After hybridization, the filters were washed 2× for 10 min at ambienttemperature in 2×SSC-1% SDS, and then 2× for 30 min at 55° C. in thesame solution.

After autoradiographic exposure of the filters, 15 positive lysis areaswere identified. These lysis areas were purified by two additionalscreening cycles under the hybridization conditions described above.

Excision in vivo:

From the positive clones, the Bluescript plasmid of the λ phage wasexcised by the in vivo excision protocol of the “ZAP-cDNA synthesiskit”.

The CCR cDNA of lucerne:

The cDNA which codes for the CCR of lucerne, 1,404 bp in size, isinserted into the EcoRI (5′ side) and Xho (3′ side) sites of theBluescript vector. It is made up of the following parts:

a non-translated transcribed 5′ part of 167 bp,

a region of 1,028 bp which codes for a protein of 342 amino acids,

a non-translated transcribed 3′ part of 209 bp.

The cDNA obtained is represented by SEQ ID NO 1, and the sequence ofamino acids deduced from this cDNA is represented by SEQ ID NO 2.

C) Preparation of the cDNA Which Codes for the CCR of Maize

Characteristics of the cDNA Bank:

The bank used was constructed from total RNA extracts from the roots ofmaize (variety AMO 406) deprived of iron, in the vector λZAP (“ZAP-cDNAsynthesis” kit from Stratagène).

Screening of the cDNA Bank:

Probe:

Screening of the maize bank was carried out with the aid of the CCR cDNAof eucalyptus. A fragment of 800 bp (Xho-Xho) of pEUCCR marked by thetechnique of random priming was used as the probe.

Display of the Bank and Imprints on Nitrocellulose Filter:

500,000 clones were displayed and then transferred to the nitrocellulosefilter (Schleicher & Schuell). For this, the filters were placed on theculture boxes for 5 min and then immersed successively in the followingsolutions:

1.5M NaCl/0.5M NaOH 5 min 1.5M NaCl/0.5M Tris pH 8 5 min 3 × SSC 2 min

heating at 80° C. for 2 hours.

Prehybridization-hybridization:

The filters were prehybridized for 12 hours and then hybridized for 24hours at 55° C. in the following medium:

Prehybridization and hybridization medium:

3×SSC

0.5% SDS

0.1% powdered milk

DNA of salmon sperm (1 mg/ml).

After hybridization, the filters were washed 2× for 10 min at ambienttemperature in 3× SSC-0.5% SDS, and then 2× for 45 min at 60° C. in thesame solution.

After autoradiographic exposure of the filters, 20 positive lysis areaswere identified. These lysis areas were purified by 3 additionalscreening cycles under the hybridization conditions described above.

Excision in vivo:

From the positive clones, the Bluescript plasmid of the λ phage wasexcised by the in vivo excision protocol of the “ZAP-cDNA synthesiskit”.

The cDNA obtained is represented by SEQ ID NO 3, and the sequence ofamino acids deduced from this cDNA is represented by SEQ ID NO 4.

29 1568 base pairs nucleic acid double linear cDNA to mRNA unknown CDS278..1306 1 CATGATTACG CCAAGCTCGA AATTAACCCT CACTAAAGGG AACAAAAGCTGGAGCTCCAC 60 CGCGGTGGCG GCCGCTCTAG AACTAGTGGA TCCCCCGGGC TGCAGGAATTCGGCACGAG 120 GGATAGAGAA GAAAGGTGGT CATATTTCCC ACTTATTATT ACAAAGTAACGTCACACCA 180 CTTTATCACC ACCTTTCTTC TCTATCCCAT TCATTCTCAT TCATTCATTCACCTCACCT 240 ACCTCACCTC ACCTCCCTTT ACAAGAAGAA GGAATAT ATG CCT GCC GCTACC GCA 295 Met Pro Ala Ala Thr Ala 1 5 GCC GCC GCC GCC GAA TCT TCC TCAGTT TCC GGC GAA ACC ATA TGT GTC 343 Ala Ala Ala Ala Glu Ser Ser Ser ValSer Gly Glu Thr Ile Cys Val 10 15 20 ACC GGG GCC GGT GGC CTC ATC GCT TCTTGG ATG GTT AAG CTC CTC TTG 391 Thr Gly Ala Gly Gly Leu Ile Ala Ser TrpMet Val Lys Leu Leu Leu 25 30 35 GAG AAA GGC TAT ACC GTT CGA GGA ACC TTGCGA AAC CCA GAT GAT CCA 439 Glu Lys Gly Tyr Thr Val Arg Gly Thr Leu ArgAsn Pro Asp Asp Pro 40 45 50 AAA AAT GGG CAC TTG AAA AAG TTG GAA GGA GCAAAA GAA AGG CTA ACT 487 Lys Asn Gly His Leu Lys Lys Leu Glu Gly Ala LysGlu Arg Leu Thr 55 60 65 70 TTG GTC AAA GTT GAT CTC CTT GAT CTT AAC TCCGTT AAA GAA GCT GTT 535 Leu Val Lys Val Asp Leu Leu Asp Leu Asn Ser ValLys Glu Ala Val 75 80 85 AAT GGA TGT CAT GGT GTC TTT CAC ACT GCT TCT CCCGTT ACA GAT AAC 583 Asn Gly Cys His Gly Val Phe His Thr Ala Ser Pro ValThr Asp Asn 90 95 100 CCC GAG GAA ATG GTG GAG CCA GCA GTG AAT GGA GCAAAG AAT GTG ATC 631 Pro Glu Glu Met Val Glu Pro Ala Val Asn Gly Ala LysAsn Val Ile 105 110 115 ATA GCT GGT GCA GAA GCA AAA GTG AGG CGC GTG GTTTTC ACA TCA TCA 679 Ile Ala Gly Ala Glu Ala Lys Val Arg Arg Val Val PheThr Ser Ser 120 125 130 ATT GGT GCA GTC TAT ATG GAC CCC AAT AGG AGT GTTGAT GTA GAG GTT 727 Ile Gly Ala Val Tyr Met Asp Pro Asn Arg Ser Val AspVal Glu Val 135 140 145 150 GAT GAG TCT TGC TGG AGT GAT TTG GAG TTT TGCAAG AAA ACC AAG AAT 775 Asp Glu Ser Cys Trp Ser Asp Leu Glu Phe Cys LysLys Thr Lys Asn 155 160 165 TGG TAT TGC TAT GGG AAA GCA GTG GCA GAA GCAGCA GCA TGG GAT GTA 823 Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu Ala AlaAla Trp Asp Val 170 175 180 GCA AAA GAG AAA GGT GTG GAT TTG GTT GTA GTGAAT CCA GTT TTG GTT 871 Ala Lys Glu Lys Gly Val Asp Leu Val Val Val AsnPro Val Leu Val 185 190 195 CTT GGA CCA TTG CTA CAA CCT ACA ATC AAT GCAAGC ACA ATT CAC ATA 919 Leu Gly Pro Leu Leu Gln Pro Thr Ile Asn Ala SerThr Ile His Ile 200 205 210 CTA AAA TAC CTA ACT GGT TCA GCT AAG ACC TATGCA AAT GCA ACA CAA 967 Leu Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr AlaAsn Ala Thr Gln 215 220 225 230 GCT TAT GTT CAT GTT AGG GAT GTT GCA TTAGCT CAC ATA CTT GTT TAT 1015 Ala Tyr Val His Val Arg Asp Val Ala Leu AlaHis Ile Leu Val Tyr 235 240 245 GAG AAA CCT TCT GCT TCT GGT AGA TAC TTATGT GCT GAA ACT TCA CTT 1063 Glu Lys Pro Ser Ala Ser Gly Arg Tyr Leu CysAla Glu Thr Ser Leu 250 255 260 CAT CGT GGG GAG CTT GTT GAA ATT CTT GCTAAG TAT TTC CCT GAG TAC 1111 His Arg Gly Glu Leu Val Glu Ile Leu Ala LysTyr Phe Pro Glu Tyr 265 270 275 CCA ATT CCT ACC AAG TGT TCA GAT GAA AAGAAT CCT CGA GTG AAA CCA 1159 Pro Ile Pro Thr Lys Cys Ser Asp Glu Lys AsnPro Arg Val Lys Pro 280 285 290 CAT ATC TTC TCA AAT AAA AAA CTG AAG GATTTG GGA TTG GAA TTT ACA 1207 His Ile Phe Ser Asn Lys Lys Leu Lys Asp LeuGly Leu Glu Phe Thr 295 300 305 310 CCA GTG AGT GAA TGT TTA TAT GAA ACCGTT AAG AGC CTA CAA GAC CAA 1255 Pro Val Ser Glu Cys Leu Tyr Glu Thr ValLys Ser Leu Gln Asp Gln 315 320 325 GGT CAC CTT TCT ATT CCA AAC AAA GAAGAT TCT CTA GCA GTC AAA TCC 1303 Gly His Leu Ser Ile Pro Asn Lys Glu AspSer Leu Ala Val Lys Ser 330 335 340 TAAACCAACC ATCCTTTGTT AACAAGTTCAATTCAGGGCC AAAAAGAATC ATCTTTTA 1363 TACCTGCGAG GCTTTAGGCT CTAGCAATTTGATACTATAA ATGACCGTAA TTGGATGG 1423 AGTTGTAAGA AAGTATCATG CTAGAATTTACTATTTGTCT TTATGTTTGA AAAATAAG 1483 CATTATATTA AAAAAAAAAA AAAAAAAAAAAACTCGAGGG GGGGCCCGGT ACCCAATT 1543 CCCTATAGTG AGTCGTATTA CAATT 1568 342amino acids amino acid linear protein unknown 2 Met Pro Ala Ala Thr AlaAla Ala Ala Ala Glu Ser Ser Ser Val Ser 1 5 10 15 Gly Glu Thr Ile CysVal Thr Gly Ala Gly Gly Leu Ile Ala Ser Trp 20 25 30 Met Val Lys Leu LeuLeu Glu Lys Gly Tyr Thr Val Arg Gly Thr Leu 35 40 45 Arg Asn Pro Asp AspPro Lys Asn Gly His Leu Lys Lys Leu Glu Gly 50 55 60 Ala Lys Glu Arg LeuThr Leu Val Lys Val Asp Leu Leu Asp Leu Asn 65 70 75 80 Ser Val Lys GluAla Val Asn Gly Cys His Gly Val Phe His Thr Ala 85 90 95 Ser Pro Val ThrAsp Asn Pro Glu Glu Met Val Glu Pro Ala Val Asn 100 105 110 Gly Ala LysAsn Val Ile Ile Ala Gly Ala Glu Ala Lys Val Arg Arg 115 120 125 Val ValPhe Thr Ser Ser Ile Gly Ala Val Tyr Met Asp Pro Asn Arg 130 135 140 SerVal Asp Val Glu Val Asp Glu Ser Cys Trp Ser Asp Leu Glu Phe 145 150 155160 Cys Lys Lys Thr Lys Asn Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu 165170 175 Ala Ala Ala Trp Asp Val Ala Lys Glu Lys Gly Val Asp Leu Val Val180 185 190 Val Asn Pro Val Leu Val Leu Gly Pro Leu Leu Gln Pro Thr IleAsn 195 200 205 Ala Ser Thr Ile His Ile Leu Lys Tyr Leu Thr Gly Ser AlaLys Thr 210 215 220 Tyr Ala Asn Ala Thr Gln Ala Tyr Val His Val Arg AspVal Ala Leu 225 230 235 240 Ala His Ile Leu Val Tyr Glu Lys Pro Ser AlaSer Gly Arg Tyr Leu 245 250 255 Cys Ala Glu Thr Ser Leu His Arg Gly GluLeu Val Glu Ile Leu Ala 260 265 270 Lys Tyr Phe Pro Glu Tyr Pro Ile ProThr Lys Cys Ser Asp Glu Lys 275 280 285 Asn Pro Arg Val Lys Pro His IlePhe Ser Asn Lys Lys Leu Lys Asp 290 295 300 Leu Gly Leu Glu Phe Thr ProVal Ser Glu Cys Leu Tyr Glu Thr Val 305 310 315 320 Lys Ser Leu Gln AspGln Gly His Leu Ser Ile Pro Asn Lys Glu Asp 325 330 335 Ser Leu Ala ValLys Ser 340 1556 base pairs nucleic acid double linear cDNA to mRNAunknown CDS 195..1310 3 GTTCCCATGA TTACGCCAAG CTCGAAATTA ACCCTCACTAAAGGGAACAA AAGCTGGAGC 60 TCCACCGCGG TGGCGGCCGC TCTAGAACTA GTGGATCCCCCGGGCTGCAG GAATTCGGC 120 CGAGAGGACA CAAGCGAGCG CTAGCCAGAA GAGCAGCTGCAGGTACTATT ATCATCGTC 180 TCGTCGTCGC CAGG ATG ACC GTC GTC GAC GCC GTC GTCTCC TCC ACC GAT 230 Met Thr Val Val Asp Ala Val Val Ser Ser Thr Asp 1 510 GCC GGC GCC CCT GCC GCC GCC GCC GCA CCG GTA CCG GCG GGG AAC GGG 278Ala Gly Ala Pro Ala Ala Ala Ala Ala Pro Val Pro Ala Gly Asn Gly 15 20 25CAG ACC GTG TGC GTC ACC GGC GCG GCC GGG TAC ATC GCC TCG TGG TTG 326 GlnThr Val Cys Val Thr Gly Ala Ala Gly Tyr Ile Ala Ser Trp Leu 30 35 40 GTGAAG CTG CTG CTC GAG AAG GGA TAC ACT GTG AAG GGC ACC GTG AGG 374 Val LysLeu Leu Leu Glu Lys Gly Tyr Thr Val Lys Gly Thr Val Arg 45 50 55 60 AACCCA GAT GAC CCG AAG AAC GCG CAC CTC AGG GCG CTG GAC GGC GCC 422 Asn ProAsp Asp Pro Lys Asn Ala His Leu Arg Ala Leu Asp Gly Ala 65 70 75 GCC GAGCGG CTG ATC CTC TGC AAG GCC GAT CTG CTG GAC TAC GAC GCC 470 Ala Glu ArgLeu Ile Leu Cys Lys Ala Asp Leu Leu Asp Tyr Asp Ala 80 85 90 ATC TGC CGCGCC GTG CAG GGC TGC CAG GGC GTC TTC CAC ACC GCC TCC 518 Ile Cys Arg AlaVal Gln Gly Cys Gln Gly Val Phe His Thr Ala Ser 95 100 105 CCC GTC ACCGAC GAC CCG GAG CAA ATG GTG GAG CCG GCG GTG CGC GGC 566 Pro Val Thr AspAsp Pro Glu Gln Met Val Glu Pro Ala Val Arg Gly 110 115 120 ACC GAG TACGTG ATC AAC GCG GCG GCG GAG GCC GGC ACG GTG CGG CGG 614 Thr Glu Tyr ValIle Asn Ala Ala Ala Glu Ala Gly Thr Val Arg Arg 125 130 135 140 GTG GTGTTC ACG TCG TCC ATC GGC GCC GTG ACC ATG GAC CCC AAG CGC 662 Val Val PheThr Ser Ser Ile Gly Ala Val Thr Met Asp Pro Lys Arg 145 150 155 GGG CCCGAC GTC GTG GTC GAC GAG TCG TGC TGG AGC GAC CTC GAG TTC 710 Gly Pro AspVal Val Val Asp Glu Ser Cys Trp Ser Asp Leu Glu Phe 160 165 170 TGC GAGAAA ACC AGG AAC TGG TAC TGC TAC GGC AAG GCG GTG GCG GAG 758 Cys Glu LysThr Arg Asn Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu 175 180 185 CAG GCGGCG TGG GAG GCG GCC CGG CGG CGG GGC GTG GAC CTG GTG GTG 806 Gln Ala AlaTrp Glu Ala Ala Arg Arg Arg Gly Val Asp Leu Val Val 190 195 200 GTG AACCCC GTG CTG GTG GTG GGC CCC CTG CTG CAG GCG ACG GTG AAC 854 Val Asn ProVal Leu Val Val Gly Pro Leu Leu Gln Ala Thr Val Asn 205 210 215 220 GCCAGC ATC GCG CAC ATC CTC AAG TAC CTG GAC GGC TCG GCC CGC ACC 902 Ala SerIle Ala His Ile Leu Lys Tyr Leu Asp Gly Ser Ala Arg Thr 225 230 235 TTCGCC AAC GCC GTG CAG GCG TAC GTG GAC GTG CGC GAC GTG GCC GAC 950 Phe AlaAsn Ala Val Gln Ala Tyr Val Asp Val Arg Asp Val Ala Asp 240 245 250 GCGCAC CTC CGC GTC TTC GAG AGC CCC CGC GCG TCC GGC CGC CAC CTC 998 Ala HisLeu Arg Val Phe Glu Ser Pro Arg Ala Ser Gly Arg His Leu 255 260 265 TGCGCC GAG CGC GTC CTC CAC CGC GAG GAC GTC GTC CGC ATC CTC GCC 1046 Cys AlaGlu Arg Val Leu His Arg Glu Asp Val Val Arg Ile Leu Ala 270 275 280 AAGCTC TTC CCC GAG TAC CCC GTC CCA GCC AGG TGC TCC GAC GAG GTG 1094 Lys LeuPhe Pro Glu Tyr Pro Val Pro Ala Arg Cys Ser Asp Glu Val 285 290 295 300AAT CCG CGG AAG CAG CCG TAC AAG TTC TCC AAC CAG AAG CTC CGG GAC 1142 AsnPro Arg Lys Gln Pro Tyr Lys Phe Ser Asn Gln Lys Leu Arg Asp 305 310 315CTG GGG CTG CAG TTC CGG CCG GTC AGC CAG TCG CTT TAC GAC ACG GTG 1190 LeuGly Leu Gln Phe Arg Pro Val Ser Gln Ser Leu Tyr Asp Thr Val 320 325 330AAG AAC CTC CAG GAG AAG GGC CAC CTG CCG GTG CTC GGA GAG CGG ACG 1238 LysAsn Leu Gln Glu Lys Gly His Leu Pro Val Leu Gly Glu Arg Thr 335 340 345ACG ACG GAG GCC GCC GAC AAG GAT GCC CCC GCG GCC GAG ATG CAG CAG 1286 ThrThr Glu Ala Ala Asp Lys Asp Ala Pro Ala Ala Glu Met Gln Gln 350 355 360GGA GGG ATC GCC ATC CGT GCC TGAGAGGGCG ATGCCACACA TGAACACCAA 1337 GlyGly Ile Ala Ile Arg Ala 365 370 AGCAATGTTC ATACTGCTGC CCTGCACCTGCACCTTCCCC TGCTGTGTAA ACAGGCCT 1397 GTTTGTTCTG GCTGATAGTG ATGTACCCTAAGACTTGTAA CGTCATGTTC GTTCTTGT 1457 ACTATAGCGA GTGAATAAAA TTGGTTAATGTTGGATAATT CCAAAAAAAA AAAAAAAA 1517 CTCGAGGGGG GGCCCGGTAC CCAATTCGCCCTATAGTGA 1556 371 amino acids amino acid linear protein unknown 4 MetThr Val Val Asp Ala Val Val Ser Ser Thr Asp Ala Gly Ala Pro 1 5 10 15Ala Ala Ala Ala Ala Pro Val Pro Ala Gly Asn Gly Gln Thr Val Cys 20 25 30Val Thr Gly Ala Ala Gly Tyr Ile Ala Ser Trp Leu Val Lys Leu Leu 35 40 45Leu Glu Lys Gly Tyr Thr Val Lys Gly Thr Val Arg Asn Pro Asp Asp 50 55 60Pro Lys Asn Ala His Leu Arg Ala Leu Asp Gly Ala Ala Glu Arg Leu 65 70 7580 Ile Leu Cys Lys Ala Asp Leu Leu Asp Tyr Asp Ala Ile Cys Arg Ala 85 9095 Val Gln Gly Cys Gln Gly Val Phe His Thr Ala Ser Pro Val Thr Asp 100105 110 Asp Pro Glu Gln Met Val Glu Pro Ala Val Arg Gly Thr Glu Tyr Val115 120 125 Ile Asn Ala Ala Ala Glu Ala Gly Thr Val Arg Arg Val Val PheThr 130 135 140 Ser Ser Ile Gly Ala Val Thr Met Asp Pro Lys Arg Gly ProAsp Val 145 150 155 160 Val Val Asp Glu Ser Cys Trp Ser Asp Leu Glu PheCys Glu Lys Thr 165 170 175 Arg Asn Trp Tyr Cys Tyr Gly Lys Ala Val AlaGlu Gln Ala Ala Trp 180 185 190 Glu Ala Ala Arg Arg Arg Gly Val Asp LeuVal Val Val Asn Pro Val 195 200 205 Leu Val Val Gly Pro Leu Leu Gln AlaThr Val Asn Ala Ser Ile Ala 210 215 220 His Ile Leu Lys Tyr Leu Asp GlySer Ala Arg Thr Phe Ala Asn Ala 225 230 235 240 Val Gln Ala Tyr Val AspVal Arg Asp Val Ala Asp Ala His Leu Arg 245 250 255 Val Phe Glu Ser ProArg Ala Ser Gly Arg His Leu Cys Ala Glu Arg 260 265 270 Val Leu His ArgGlu Asp Val Val Arg Ile Leu Ala Lys Leu Phe Pro 275 280 285 Glu Tyr ProVal Pro Ala Arg Cys Ser Asp Glu Val Asn Pro Arg Lys 290 295 300 Gln ProTyr Lys Phe Ser Asn Gln Lys Leu Arg Asp Leu Gly Leu Gln 305 310 315 320Phe Arg Pro Val Ser Gln Ser Leu Tyr Asp Thr Val Lys Asn Leu Gln 325 330335 Glu Lys Gly His Leu Pro Val Leu Gly Glu Arg Thr Thr Thr Glu Ala 340345 350 Ala Asp Lys Asp Ala Pro Ala Ala Glu Met Gln Gln Gly Gly Ile Ala355 360 365 Ile Arg Ala 370 1297 base pairs nucleic acid double linearcDNA to mRNA unknown CDS 136..1140 5 CGGCCGGGAC GACCCGTTCC TCTTCTTCCGGGTCACCGTC ACCATGTTAC ACAACATCTC 60 CGGCTAAAAA AAAAAGGAAA AAAAGCGCAACCTCCACCTC CTGAACCCCT CTCCCCCCT 120 GCCGGCAATC CCACC ATG CCC GTC GAC GCCCTC CCC GGT TCC GGC CAG ACC 171 Met Pro Val Asp Ala Leu Pro Gly Ser GlyGln Thr 1 5 10 GTC TGC GTC ACC GGC GCC GGC GGG TTC ATC GCC TCC TGG ATTGTC AAG 219 Val Cys Val Thr Gly Ala Gly Gly Phe Ile Ala Ser Trp Ile ValLys 15 20 25 CTT CTC CTC GAG CGA GGC TAC ACC GTG CGA GGA ACC GTC AGG AACCCA 267 Leu Leu Leu Glu Arg Gly Tyr Thr Val Arg Gly Thr Val Arg Asn Pro30 35 40 GAC GAC CCG AAG AAT GGT CAT CTG AGA GAT CTG GAA GGA GCC AGC GAG315 Asp Asp Pro Lys Asn Gly His Leu Arg Asp Leu Glu Gly Ala Ser Glu 4550 55 60 AGG CTG ACG CTG TAC AAG GGT GAT CTG ATG GAC TAC GGG AGC TTG GAA363 Arg Leu Thr Leu Tyr Lys Gly Asp Leu Met Asp Tyr Gly Ser Leu Glu 6570 75 GAA GCC ATC AAG GGG TGC GAC GGC GTC GTC CAC ACC GCC TCT CCG GTC411 Glu Ala Ile Lys Gly Cys Asp Gly Val Val His Thr Ala Ser Pro Val 8085 90 ACC GAC GAT CCT GAG CAA ATG GTG GAG CCA GCG GTG ATC GGG ACG AAA459 Thr Asp Asp Pro Glu Gln Met Val Glu Pro Ala Val Ile Gly Thr Lys 95100 105 AAT GTG ATC GTC GCA GCG GCG GAG GCC AAG GTC CGG CGG GTT GTG TTC507 Asn Val Ile Val Ala Ala Ala Glu Ala Lys Val Arg Arg Val Val Phe 110115 120 ACC TCC TCC ATC GGT GCA GTC ACC ATG GAC CCC AAC CGG GCA GAC GTT555 Thr Ser Ser Ile Gly Ala Val Thr Met Asp Pro Asn Arg Ala Asp Val 125130 135 140 GTG GTG GAC GAG TCT TGT TGG AGC GAC CTC GAA TTT TGC AAG AGCACT 603 Val Val Asp Glu Ser Cys Trp Ser Asp Leu Glu Phe Cys Lys Ser Thr145 150 155 AAG AAC TGG TAT TGC TAC GGC AAG GCA GTG GCG GAG AAG GCC GCTTGG 651 Lys Asn Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu Lys Ala Ala Trp160 165 170 CCA GAG GGC AAG GAG AGA GGG GTT GAC CTC GTG GTG ATT AAC CCTGTG 699 Pro Glu Gly Lys Glu Arg Gly Val Asp Leu Val Val Ile Asn Pro Val175 180 185 CTC GTG CTT GGA CCG CTC CTT CAG TCG ACG ATC AAT GCG AGC ATCATC 747 Leu Val Leu Gly Pro Leu Leu Gln Ser Thr Ile Asn Ala Ser Ile Ile190 195 200 CAC ATC CTC AAG TAC TTG ACT GGC TCA GCC AAG ACC TAC GCC AACTCG 795 His Ile Leu Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Ser205 210 215 220 GTC CAG GCG TAC GTG CAC GTC AAG GAC GTC GCG CTT GCC CACGTC CTT 843 Val Gln Ala Tyr Val His Val Lys Asp Val Ala Leu Ala His ValLeu 225 230 235 GTC TTG GAG ACC CCA TCC GCC TCA GGC CGC TAT TTG TGC GCCGAG AGC 891 Val Leu Glu Thr Pro Ser Ala Ser Gly Arg Tyr Leu Cys Ala GluSer 240 245 250 GTC CTC CAC CGT GGC GAT GTG GTG GAA ATC CTT GCC AAG TTCTTC CCT 939 Val Leu His Arg Gly Asp Val Val Glu Ile Leu Ala Lys Phe PhePro 255 260 265 GAG TAT AAT GTA CCG ACC AAG TGC TCT GAT GAG GTG AAC CCAAGA GTA 987 Glu Tyr Asn Val Pro Thr Lys Cys Ser Asp Glu Val Asn Pro ArgVal 270 275 280 AAA CCA TAC AAG TTC TCC AAC CAG AAG CTG AGA GAC TTG GGGCTC GAG 1035 Lys Pro Tyr Lys Phe Ser Asn Gln Lys Leu Arg Asp Leu Gly LeuGlu 285 290 295 300 TTC ACC CCG GTG AAG CAG TGC CTG TAC GAA ACT GTC AAGAGC TTG CAG 1083 Phe Thr Pro Val Lys Gln Cys Leu Tyr Glu Thr Val Lys SerLeu Gln 305 310 315 GAG AAA GGC CAC CTA CCA GTC CCC TCC CCG CCG GAA GATTCG GTG CGT 1131 Glu Lys Gly His Leu Pro Val Pro Ser Pro Pro Glu Asp SerVal Arg 320 325 330 ATT CAG GGA TGATCTTAGA TCCATCACGG TGCGCATTTGTAATCCGGAG 1180 Ile Gln Gly 335 AAATGAGAGA AACATGTGGG AATTTGTTTGTACTTTTCTA AGTCAAACCT GGAGATAC 1240 ACCCTGAGTT CTGCATTGGA ATGGAAGTTGTCAATTGTTC CAAAAAAAAA AAAAAAA 1297 335 amino acids amino acid linearprotein unknown 6 Met Pro Val Asp Ala Leu Pro Gly Ser Gly Gln Thr ValCys Val Thr 1 5 10 15 Gly Ala Gly Gly Phe Ile Ala Ser Trp Ile Val LysLeu Leu Leu Glu 20 25 30 Arg Gly Tyr Thr Val Arg Gly Thr Val Arg Asn ProAsp Asp Pro Lys 35 40 45 Asn Gly His Leu Arg Asp Leu Glu Gly Ala Ser GluArg Leu Thr Leu 50 55 60 Tyr Lys Gly Asp Leu Met Asp Tyr Gly Ser Leu GluGlu Ala Ile Lys 65 70 75 80 Gly Cys Asp Gly Val Val His Thr Ala Ser ProVal Thr Asp Asp Pro 85 90 95 Glu Gln Met Val Glu Pro Ala Val Ile Gly ThrLys Asn Val Ile Val 100 105 110 Ala Ala Ala Glu Ala Lys Val Arg Arg ValVal Phe Thr Ser Ser Ile 115 120 125 Gly Ala Val Thr Met Asp Pro Asn ArgAla Asp Val Val Val Asp Glu 130 135 140 Ser Cys Trp Ser Asp Leu Glu PheCys Lys Ser Thr Lys Asn Trp Tyr 145 150 155 160 Cys Tyr Gly Lys Ala ValAla Glu Lys Ala Ala Trp Pro Glu Gly Lys 165 170 175 Glu Arg Gly Val AspLeu Val Val Ile Asn Pro Val Leu Val Leu Gly 180 185 190 Pro Leu Leu GlnSer Thr Ile Asn Ala Ser Ile Ile His Ile Leu Lys 195 200 205 Tyr Leu ThrGly Ser Ala Lys Thr Tyr Ala Asn Ser Val Gln Ala Tyr 210 215 220 Val HisVal Lys Asp Val Ala Leu Ala His Val Leu Val Leu Glu Thr 225 230 235 240Pro Ser Ala Ser Gly Arg Tyr Leu Cys Ala Glu Ser Val Leu His Arg 245 250255 Gly Asp Val Val Glu Ile Leu Ala Lys Phe Phe Pro Glu Tyr Asn Val 260265 270 Pro Thr Lys Cys Ser Asp Glu Val Asn Pro Arg Val Lys Pro Tyr Lys275 280 285 Phe Ser Asn Gln Lys Leu Arg Asp Leu Gly Leu Glu Phe Thr ProVal 290 295 300 Lys Gln Cys Leu Tyr Glu Thr Val Lys Ser Leu Gln Glu LysGly His 305 310 315 320 Leu Pro Val Pro Ser Pro Pro Glu Asp Ser Val ArgIle Gln Gly 325 330 335 1376 base pairs nucleic acid double linear cDNAto mRNA unknown CDS 99..1112 7 GAAAACACAC CTCCTCTCTT CTTTGTCTCTGTCTGTTCTC CACTTTCCCA GTCACCAAAC 60 TCGTATGCAT ATAATTACAT TTATCTAAATATAACAAC ATG CCT GTT GAT GCT 113 Met Pro Val Asp Ala 1 5 TCA TCA CTT TCAGGC CAA GGC CAA ACT ATC TGT GTC ACC GGG GGT GGT 161 Ser Ser Leu Ser GlyGln Gly Gln Thr Ile Cys Val Thr Gly Gly Gly 10 15 20 GGT TTC ATT GCT TCTTGG ATG GTT AAA CTT CTT TTA GAT AAA GGT TAC 209 Gly Phe Ile Ala Ser TrpMet Val Lys Leu Leu Leu Asp Lys Gly Tyr 25 30 35 ACT GTT AGA GGA ACT GCGAGG AAC CCA GCT GAT CCC AAG AAT TCT CAT 257 Thr Val Arg Gly Thr Ala ArgAsn Pro Ala Asp Pro Lys Asn Ser His 40 45 50 TTG AGG GAG CTT GAA GGA GCTGAA GAA AGA TTA ACT TTA TGC AAA GCT 305 Leu Arg Glu Leu Glu Gly Ala GluGlu Arg Leu Thr Leu Cys Lys Ala 55 60 65 GAT CTT CTT GAT TAT GAG TCT CTTAAA GAG GGT ATT CAA GGG TGT GAT 353 Asp Leu Leu Asp Tyr Glu Ser Leu LysGlu Gly Ile Gln Gly Cys Asp 70 75 80 85 GGT GTT TTC CAC ACT GCT TCT CCTGTC ACA GAT GAT CCG GAA GAA ATG 401 Gly Val Phe His Thr Ala Ser Pro ValThr Asp Asp Pro Glu Glu Met 90 95 100 GTG GAG CCA GCA GTG AAC GGG ACCAAA AAT GTG ATA ATT GCG GCG GCT 449 Val Glu Pro Ala Val Asn Gly Thr LysAsn Val Ile Ile Ala Ala Ala 105 110 115 GAG GCC AAA GTC CGA CGA GTG GTGTTC ACG TCA TCA ATT GGC GCT GTG 497 Glu Ala Lys Val Arg Arg Val Val PheThr Ser Ser Ile Gly Ala Val 120 125 130 TAC ATG GAT CCC AAT AAG GGC CCAGAT GTT GTC ATT GAT GAG TCT TGC 545 Tyr Met Asp Pro Asn Lys Gly Pro AspVal Val Ile Asp Glu Ser Cys 135 140 145 TGG AGT GAT CTT GAA TTC TGC AAGAAC ACC AAG AAT TGG TAT TGC TAT 593 Trp Ser Asp Leu Glu Phe Cys Lys AsnThr Lys Asn Trp Tyr Cys Tyr 150 155 160 165 GGA AAG GCT GTG GCA GAA CAAGCT GCA TGG GAT ATG GCT AAG GAG AAA 641 Gly Lys Ala Val Ala Glu Gln AlaAla Trp Asp Met Ala Lys Glu Lys 170 175 180 GGG GTG GAC CTA GTG GTG GTTAAC CCA GTG CTG GTG CTT GGA CCA TTG 689 Gly Val Asp Leu Val Val Val AsnPro Val Leu Val Leu Gly Pro Leu 185 190 195 TTG CAG CCC ACT GTC AAT GCTAGC ATC ACT CAC ATC CTC AAG TAC CTC 737 Leu Gln Pro Thr Val Asn Ala SerIle Thr His Ile Leu Lys Tyr Leu 200 205 210 ACC GGC TCA GCC AAG ACA TATGCT AAC TCT GTT CAA GCT TAT GTG CAT 785 Thr Gly Ser Ala Lys Thr Tyr AlaAsn Ser Val Gln Ala Tyr Val His 215 220 225 GTT AGG GAT GTG GCA CTA GCCCAC ATT TTA GTC TTT GAG ACG CCT TCC 833 Val Arg Asp Val Ala Leu Ala HisIle Leu Val Phe Glu Thr Pro Ser 230 235 240 245 GCC TCC GGC CGT TAC CTTTGC TCT GAG AGC GTT CTC CAC CGT GGA GAG 881 Ala Ser Gly Arg Tyr Leu CysSer Glu Ser Val Leu His Arg Gly Glu 250 255 260 GTG GTG GAA ATC CTT GCAAAG TTC TTC CCT GAG TAC CCC ATC CCT ACC 929 Val Val Glu Ile Leu Ala LysPhe Phe Pro Glu Tyr Pro Ile Pro Thr 265 270 275 AAG TGC TCA GAT GAG AAGAAC CCA AGA AAA CAA CCT TAC AAG TTC TCA 977 Lys Cys Ser Asp Glu Lys AsnPro Arg Lys Gln Pro Tyr Lys Phe Ser 280 285 290 AAC CAG AAG CTA AGG GATCTG GGT TTC GAA TTC ACC CCA GTA AAG CAG 1025 Asn Gln Lys Leu Arg Asp LeuGly Phe Glu Phe Thr Pro Val Lys Gln 295 300 305 TGT CTG TAT GAA ACT GTTAAG AGT TTG CAG GAA AAG GGT CAC CTT CCA 1073 Cys Leu Tyr Glu Thr Val LysSer Leu Gln Glu Lys Gly His Leu Pro 310 315 320 325 ATC CCA AAA CAA GCTGCA GAA GAG TCT TTG AAA ATT CAA TAAGGCCTCT 1122 Ile Pro Lys Gln Ala AlaGlu Glu Ser Leu Lys Ile Gln 330 335 TGGAACTATT TATTAGGATT GTTCCATACCCCAAGTTTGG ATCGCAAATG CTAGGGAA 1182 GAGCATATTA AAGAATGCCA ATGTGCAGGTGTTTTAGTAT TTTACATGAA GAACTCTG 1242 TATCCTTGTG CTTATAATAA TTTTTTTCAAGTGAGTGTCT TCAAATGTTC AACTTGTA 1302 TGTGGTTGTC TAACTTTATC CAGTTTCAATATAAAAGAGG AACGATTCTA TGTCTTAA 1362 AAAAAAAAAA AAAA 1376 338 amino acidsamino acid linear protein unknown 8 Met Pro Val Asp Ala Ser Ser Leu SerGly Gln Gly Gln Thr Ile Cys 1 5 10 15 Val Thr Gly Gly Gly Gly Phe IleAla Ser Trp Met Val Lys Leu Leu 20 25 30 Leu Asp Lys Gly Tyr Thr Val ArgGly Thr Ala Arg Asn Pro Ala Asp 35 40 45 Pro Lys Asn Ser His Leu Arg GluLeu Glu Gly Ala Glu Glu Arg Leu 50 55 60 Thr Leu Cys Lys Ala Asp Leu LeuAsp Tyr Glu Ser Leu Lys Glu Gly 65 70 75 80 Ile Gln Gly Cys Asp Gly ValPhe His Thr Ala Ser Pro Val Thr Asp 85 90 95 Asp Pro Glu Glu Met Val GluPro Ala Val Asn Gly Thr Lys Asn Val 100 105 110 Ile Ile Ala Ala Ala GluAla Lys Val Arg Arg Val Val Phe Thr Ser 115 120 125 Ser Ile Gly Ala ValTyr Met Asp Pro Asn Lys Gly Pro Asp Val Val 130 135 140 Ile Asp Glu SerCys Trp Ser Asp Leu Glu Phe Cys Lys Asn Thr Lys 145 150 155 160 Asn TrpTyr Cys Tyr Gly Lys Ala Val Ala Glu Gln Ala Ala Trp Asp 165 170 175 MetAla Lys Glu Lys Gly Val Asp Leu Val Val Val Asn Pro Val Leu 180 185 190Val Leu Gly Pro Leu Leu Gln Pro Thr Val Asn Ala Ser Ile Thr His 195 200205 Ile Leu Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Ser Val 210215 220 Gln Ala Tyr Val His Val Arg Asp Val Ala Leu Ala His Ile Leu Val225 230 235 240 Phe Glu Thr Pro Ser Ala Ser Gly Arg Tyr Leu Cys Ser GluSer Val 245 250 255 Leu His Arg Gly Glu Val Val Glu Ile Leu Ala Lys PhePhe Pro Glu 260 265 270 Tyr Pro Ile Pro Thr Lys Cys Ser Asp Glu Lys AsnPro Arg Lys Gln 275 280 285 Pro Tyr Lys Phe Ser Asn Gln Lys Leu Arg AspLeu Gly Phe Glu Phe 290 295 300 Thr Pro Val Lys Gln Cys Leu Tyr Glu ThrVal Lys Ser Leu Gln Glu 305 310 315 320 Lys Gly His Leu Pro Ile Pro LysGln Ala Ala Glu Glu Ser Leu Lys 325 330 335 Ile Gln 1273 base pairsnucleic acid double linear cDNA to mRNA unknown CDS 66..1091 9TCGTAGCTCT TCCCTTTCAC CAACAAGCTA GTTTAGACAA GTACAGTGGT ACTGTAAGAG 60CAACA ATG ACC GTT GTC GAC GCC GCC GCG CCG CAG CTG CCT GGC CAT 107 MetThr Val Val Asp Ala Ala Ala Pro Gln Leu Pro Gly His 1 5 10 GGG CAG ACCGTG TGC GTC ACC GGC GCC GCG GGG TAC ATC GCG TCG GGG 155 Gly Gln Thr ValCys Val Thr Gly Ala Ala Gly Tyr Ile Ala Ser Gly 15 20 25 30 CTC GTC AAGCTG CTC CTG GAG AGA GGC TAC ACC GTG AAG GGC ACA GTG 203 Leu Val Lys LeuLeu Leu Glu Arg Gly Tyr Thr Val Lys Gly Thr Val 35 40 45 AGG AAC CCA GATGAT CCC AAG AAC GCC CAC CTG AAG GCG CTG GAC GGC 251 Arg Asn Pro Asp AspPro Lys Asn Ala His Leu Lys Ala Leu Asp Gly 50 55 60 GCC ACC AAG AGG CTGATC CTC TGC AAA GCC GAC CTC CTC GAC TAC GAC 299 Ala Thr Lys Arg Leu IleLeu Cys Lys Ala Asp Leu Leu Asp Tyr Asp 65 70 75 GCC ATA TGC GCC GCC GTCGAG GGC TGC CAC GGC GTG TTC CAC ACC GCC 347 Ala Ile Cys Ala Ala Val GluGly Cys His Gly Val Phe His Thr Ala 80 85 90 TCT CCA GTC ACC GAT GAT CCTGAG CAG ATG GTG GAG CCG GCG GTG CGG 395 Ser Pro Val Thr Asp Asp Pro GluGln Met Val Glu Pro Ala Val Arg 95 100 105 110 GGC ACG GAG TAC GTG ATCAAC GCG GCA GCG GAT GCG GGA ACG GTG CGC 443 Gly Thr Glu Tyr Val Ile AsnAla Ala Ala Asp Ala Gly Thr Val Arg 115 120 125 CGG GTG GTG TTC ACG TCGTCA ATC GGT GCC ATC ACC ATG GAC CCC AAC 491 Arg Val Val Phe Thr Ser SerIle Gly Ala Ile Thr Met Asp Pro Asn 130 135 140 CGC GGT CCT GAC GTA GTCGTC AAT GAG TCC TGC TGG AGC GAC CTC GAA 539 Arg Gly Pro Asp Val Val ValAsn Glu Ser Cys Trp Ser Asp Leu Glu 145 150 155 TTC TGC AAG AAA ACC AAGAAC TGG TAC TGC TAC GGC AAG GCC GTG GCG 587 Phe Cys Lys Lys Thr Lys AsnTrp Tyr Cys Tyr Gly Lys Ala Val Ala 160 165 170 GAG CAG GCT GCG TGG GAGGCG GCC AGG AAG CGC GGC ATC GAC CTC GTC 635 Glu Gln Ala Ala Trp Glu AlaAla Arg Lys Arg Gly Ile Asp Leu Val 175 180 185 190 GTC GTG AAC CCT GTGCTC GTG GTA GGG CCG CTG CTG CAA CCA ACG GTG 683 Val Val Asn Pro Val LeuVal Val Gly Pro Leu Leu Gln Pro Thr Val 195 200 205 AAC GCT AGC GCC GCACAC ATC CTC AAG TAC CTC GAC GGC TCG GCC AAG 731 Asn Ala Ser Ala Ala HisIle Leu Lys Tyr Leu Asp Gly Ser Ala Lys 210 215 220 AAG TAC GCC AAC GCTGTG CAG TCA TAC GTA GAC GTG CGT GAC GTA GCC 779 Lys Tyr Ala Asn Ala ValGln Ser Tyr Val Asp Val Arg Asp Val Ala 225 230 235 GGC GCG CAC ATC CGGGTG TTC GAG GCG CCT GAG GCG TCG GGC CGG TAC 827 Gly Ala His Ile Arg ValPhe Glu Ala Pro Glu Ala Ser Gly Arg Tyr 240 245 250 CTC TGC GCC GAG CGCGTG CTG CAC CGT GGG GAC GTT GTC CAA ATC CTC 875 Leu Cys Ala Glu Arg ValLeu His Arg Gly Asp Val Val Gln Ile Leu 255 260 265 270 AGC AAA CTC TTGCCT GAG TAC CCT GTG CCA ACA AGG TGC TCT GAT GAA 923 Ser Lys Leu Leu ProGlu Tyr Pro Val Pro Thr Arg Cys Ser Asp Glu 275 280 285 GTG AAC CCA CGGAAG CAG CCT TAT AAG ATG TCC AAC CAG AAG CTG CAG 971 Val Asn Pro Arg LysGln Pro Tyr Lys Met Ser Asn Gln Lys Leu Gln 290 295 300 GAT CTT GGC CTCCAG TTC ACT CCT GTG AAC GAC TCT CTG TAT GAG ACC 1019 Asp Leu Gly Leu GlnPhe Thr Pro Val Asn Asp Ser Leu Tyr Glu Thr 305 310 315 GTG AAG AGC CTCCAG GAG AAG GGA CAT CTC CTA GTA CCA AGC AAA CCC 1067 Val Lys Ser Leu GlnGlu Lys Gly His Leu Leu Val Pro Ser Lys Pro 320 325 330 GAG GGA TTA AACGGT GTA ACG GCA TGATACTGCT AAAGAAGCAG CAGAGTTCA 1121 Glu Gly Leu Asn GlyVal Thr Ala 335 340 GTGCTCCTGT AACATGGTCA AACATGAGTT GTTTTTCTGTATAAATTCTA TCCAGTAT 1181 TGTTATTTAA GTGAACTAAG AGAACAGAAT ATTGTATCATCTTCGATGTC CAATACCT 1241 AAGTGATTTG TTTTGCCACC TAAAAAAAAA AA 1273 342amino acids amino acid linear protein unknown 10 Met Thr Val Val Asp AlaAla Ala Pro Gln Leu Pro Gly His Gly Gln 1 5 10 15 Thr Val Cys Val ThrGly Ala Ala Gly Tyr Ile Ala Ser Gly Leu Val 20 25 30 Lys Leu Leu Leu GluArg Gly Tyr Thr Val Lys Gly Thr Val Arg Asn 35 40 45 Pro Asp Asp Pro LysAsn Ala His Leu Lys Ala Leu Asp Gly Ala Thr 50 55 60 Lys Arg Leu Ile LeuCys Lys Ala Asp Leu Leu Asp Tyr Asp Ala Ile 65 70 75 80 Cys Ala Ala ValGlu Gly Cys His Gly Val Phe His Thr Ala Ser Pro 85 90 95 Val Thr Asp AspPro Glu Gln Met Val Glu Pro Ala Val Arg Gly Thr 100 105 110 Glu Tyr ValIle Asn Ala Ala Ala Asp Ala Gly Thr Val Arg Arg Val 115 120 125 Val PheThr Ser Ser Ile Gly Ala Ile Thr Met Asp Pro Asn Arg Gly 130 135 140 ProAsp Val Val Val Asn Glu Ser Cys Trp Ser Asp Leu Glu Phe Cys 145 150 155160 Lys Lys Thr Lys Asn Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu Gln 165170 175 Ala Ala Trp Glu Ala Ala Arg Lys Arg Gly Ile Asp Leu Val Val Val180 185 190 Asn Pro Val Leu Val Val Gly Pro Leu Leu Gln Pro Thr Val AsnAla 195 200 205 Ser Ala Ala His Ile Leu Lys Tyr Leu Asp Gly Ser Ala LysLys Tyr 210 215 220 Ala Asn Ala Val Gln Ser Tyr Val Asp Val Arg Asp ValAla Gly Ala 225 230 235 240 His Ile Arg Val Phe Glu Ala Pro Glu Ala SerGly Arg Tyr Leu Cys 245 250 255 Ala Glu Arg Val Leu His Arg Gly Asp ValVal Gln Ile Leu Ser Lys 260 265 270 Leu Leu Pro Glu Tyr Pro Val Pro ThrArg Cys Ser Asp Glu Val Asn 275 280 285 Pro Arg Lys Gln Pro Tyr Lys MetSer Asn Gln Lys Leu Gln Asp Leu 290 295 300 Gly Leu Gln Phe Thr Pro ValAsn Asp Ser Leu Tyr Glu Thr Val Lys 305 310 315 320 Ser Leu Gln Glu LysGly His Leu Leu Val Pro Ser Lys Pro Glu Gly 325 330 335 Leu Asn Gly ValThr Ala 340 1293 base pairs nucleic acid double linear cDNA to mRNAunknown CDS 95..1108 11 CCGAGCCTAT TTCTTCCCTA TATCCACTCA TCCTTGTCTTATATCATCAT CATCATCATC 60 TACCTAAACC TGAGCTCAAC AGAAAAGTAA TACC ATG CCGTCA GTT TCC GGC 112 Met Pro Ser Val Ser Gly 1 5 CAA ATC GTT TGT GTT ACTGGC GCC GGA GGT TTC ATC GCC TCT TGG CTC 160 Gln Ile Val Cys Val Thr GlyAla Gly Gly Phe Ile Ala Ser Trp Leu 10 15 20 GTT AAA ATT CTT CTG GAA AAAGGC TAC ACT GTT AGA GGA ACA GTA CGA 208 Val Lys Ile Leu Leu Glu Lys GlyTyr Thr Val Arg Gly Thr Val Arg 25 30 35 AAT CCA GAT GAT CGA AAA AAT AGTCAT TTG AGG GAG CTT GAA CGA GCA 256 Asn Pro Asp Asp Arg Lys Asn Ser HisLeu Arg Glu Leu Glu Arg Ala 40 45 50 AAA GAG ACA TTG ACT CTG TGC AGA GCTGAT CTT CTT GAT TTT CAG AGT 304 Lys Glu Thr Leu Thr Leu Cys Arg Ala AspLeu Leu Asp Phe Gln Ser 55 60 65 70 TTG CGA GAA GCA ATC AGC GGC TGT GACGGA GTT TTC CAC ACA CGT TCT 352 Leu Arg Glu Ala Ile Ser Gly Cys Asp GlyVal Phe His Thr Arg Ser 75 80 85 CCT GTC ACT GAT GAT CCA GAA CAA ATG GTGGAG CCA GCA GTT ATT GGT 400 Pro Val Thr Asp Asp Pro Glu Gln Met Val GluPro Ala Val Ile Gly 90 95 100 ACA AAG AAT GTG ATA ACG GCA GCA GCA GAGGCC AAG GTG CGA CGT GTG 448 Thr Lys Asn Val Ile Thr Ala Ala Ala Glu AlaLys Val Arg Arg Val 105 110 115 GTG TTC ACT TCG TCA ATT GGT GCT GTG TATATG GAC CCA AAC AGG GAC 496 Val Phe Thr Ser Ser Ile Gly Ala Val Tyr MetAsp Pro Asn Arg Asp 120 125 130 CCT GAT AAG GTT GTC GAC GAG ACT TGT TGGAGT GAT CCT GAC TTC TGC 544 Pro Asp Lys Val Val Asp Glu Thr Cys Trp SerAsp Pro Asp Phe Cys 135 140 145 150 AAA AAC ACC AAG AAT TGG TAT TGT TATGGG AAG ATG GTG GCA GAA CAA 592 Lys Asn Thr Lys Asn Trp Tyr Cys Tyr GlyLys Met Val Ala Glu Gln 155 160 165 GCA GCA TGG GAC GAA GCA AGG GAG AAAGGA GTC GAT TTG GTG GCA ATC 640 Ala Ala Trp Asp Glu Ala Arg Glu Lys GlyVal Asp Leu Val Ala Ile 170 175 180 AAC CCA GTG TTG GTG CTT GGA CCA CTGCTC CAA CAG AAT GTG AAT GCC 688 Asn Pro Val Leu Val Leu Gly Pro Leu LeuGln Gln Asn Val Asn Ala 185 190 195 AGT GTT CTT CAC ATC CAC AAG TAC CTAACT GGC TCT GCT AAA ACA TAT 736 Ser Val Leu His Ile His Lys Tyr Leu ThrGly Ser Ala Lys Thr Tyr 200 205 210 ACG TCC AAT TCA CTT CAG GCA TAT GTTCAT GTT AGG GAT GTG GCT TTA 784 Thr Ser Asn Ser Leu Gln Ala Tyr Val HisVal Arg Asp Val Ala Leu 215 220 225 230 CGT CAC ATA CTT GTG TAC GAG ACACCT TCT GCA TCT GGC CGT TAT CTC 832 Arg His Ile Leu Val Tyr Glu Thr ProSer Ala Ser Gly Arg Tyr Leu 235 240 245 TGT GCC GAG AGT GTG CTG CAT CGCTGC GAT GTG GTT GAA ATT CTC GCC 880 Cys Ala Glu Ser Val Leu His Arg CysAsp Val Val Glu Ile Leu Ala 250 255 260 AAA TTC TTC CCG GAG TAT CCT ATCCCC ACC AAG TGT TCA GAT GTG ACG 928 Lys Phe Phe Pro Glu Tyr Pro Ile ProThr Lys Cys Ser Asp Val Thr 265 270 275 AAG CCA AGG GTA AAA CCG TAC AAATTC TCA AAC CAA AAG CTA AAG GAT 976 Lys Pro Arg Val Lys Pro Tyr Lys PheSer Asn Gln Lys Leu Lys Asp 280 285 290 TTG GGT CTG GAG TTT ACA CCA GTACAA TGC TTA TAT GAA ACG GTG AAG 1024 Leu Gly Leu Glu Phe Thr Pro Val GlnCys Leu Tyr Glu Thr Val Lys 295 300 305 310 AGT CTA CAA GAG AAA GGT CACCTT CCA ATT CCT ACT CAA AAG GAT GAG 1072 Ser Leu Gln Glu Lys Gly His LeuPro Ile Pro Thr Gln Lys Asp Glu 315 320 325 ATT ATT CGA ATT CAG TCT GAGAAA TTC AGA AGC TCT TAGCATGTAT 1118 Ile Ile Arg Ile Gln Ser Glu Lys PheArg Ser Ser 330 335 TGAGGAAAAG GGATCAATGG TTAAAGTTGA CCATGGCGTTGTCCCTTTAT GTACCAAG 1178 CAAATGCACC TAGAAATTTA CTTGTCTACT CTGTTGTACTTTTACTTGTC ATGGAAAT 1238 TTTTAGTGTT TTCATTGTTA TGAGATATAT TTTGGTGTAAAAAAAAAAAA AAAAA 1293 338 amino acids amino acid linear protein unknown12 Met Pro Ser Val Ser Gly Gln Ile Val Cys Val Thr Gly Ala Gly Gly 1 510 15 Phe Ile Ala Ser Trp Leu Val Lys Ile Leu Leu Glu Lys Gly Tyr Thr 2025 30 Val Arg Gly Thr Val Arg Asn Pro Asp Asp Arg Lys Asn Ser His Leu 3540 45 Arg Glu Leu Glu Arg Ala Lys Glu Thr Leu Thr Leu Cys Arg Ala Asp 5055 60 Leu Leu Asp Phe Gln Ser Leu Arg Glu Ala Ile Ser Gly Cys Asp Gly 6570 75 80 Val Phe His Thr Arg Ser Pro Val Thr Asp Asp Pro Glu Gln Met Val85 90 95 Glu Pro Ala Val Ile Gly Thr Lys Asn Val Ile Thr Ala Ala Ala Glu100 105 110 Ala Lys Val Arg Arg Val Val Phe Thr Ser Ser Ile Gly Ala ValTyr 115 120 125 Met Asp Pro Asn Arg Asp Pro Asp Lys Val Val Asp Glu ThrCys Trp 130 135 140 Ser Asp Pro Asp Phe Cys Lys Asn Thr Lys Asn Trp TyrCys Tyr Gly 145 150 155 160 Lys Met Val Ala Glu Gln Ala Ala Trp Asp GluAla Arg Glu Lys Gly 165 170 175 Val Asp Leu Val Ala Ile Asn Pro Val LeuVal Leu Gly Pro Leu Leu 180 185 190 Gln Gln Asn Val Asn Ala Ser Val LeuHis Ile His Lys Tyr Leu Thr 195 200 205 Gly Ser Ala Lys Thr Tyr Thr SerAsn Ser Leu Gln Ala Tyr Val His 210 215 220 Val Arg Asp Val Ala Leu ArgHis Ile Leu Val Tyr Glu Thr Pro Ser 225 230 235 240 Ala Ser Gly Arg TyrLeu Cys Ala Glu Ser Val Leu His Arg Cys Asp 245 250 255 Val Val Glu IleLeu Ala Lys Phe Phe Pro Glu Tyr Pro Ile Pro Thr 260 265 270 Lys Cys SerAsp Val Thr Lys Pro Arg Val Lys Pro Tyr Lys Phe Ser 275 280 285 Asn GlnLys Leu Lys Asp Leu Gly Leu Glu Phe Thr Pro Val Gln Cys 290 295 300 LeuTyr Glu Thr Val Lys Ser Leu Gln Glu Lys Gly His Leu Pro Ile 305 310 315320 Pro Thr Gln Lys Asp Glu Ile Ile Arg Ile Gln Ser Glu Lys Phe Arg 325330 335 Ser Ser 1297 base pairs nucleic acid double linear cDNA to mRNAunknown CDS 136..1140 13 CGGCCGGGAC GACCCGTTCC TCTTCTTCCG GGTCACCGTCACCATGTTAC ACAACATCTC 60 CGGCTAAAAA AAAAAGGAAA AAAAGCGCAA CCTCCACCTCCTGAACCCCT CTCCCCCCT 120 GCCGGCAATC CCACC ATG CCC GTC GAC GCC CTC CCCGGT TCC GGC CAG ACC 171 Met Pro Val Asp Ala Leu Pro Gly Ser Gly Gln Thr1 5 10 GTC TGC GTC ACC GGC GCC GGC GGG TTC ATC GCC TCC TGG ATT GTC AAG219 Val Cys Val Thr Gly Ala Gly Gly Phe Ile Ala Ser Trp Ile Val Lys 1520 25 CTT CTC CTC GAG CGA GGC TAC ACC GTG CGA GGA ACC GTC AGG AAC CCA267 Leu Leu Leu Glu Arg Gly Tyr Thr Val Arg Gly Thr Val Arg Asn Pro 3035 40 GAC GAC CCG AAG AAT GGT CAT CTG AGA GAT CTG GAA GGA GCC AGC GAG315 Asp Asp Pro Lys Asn Gly His Leu Arg Asp Leu Glu Gly Ala Ser Glu 4550 55 60 AGG CTG ACG CTG TAC AAG GGT GAT CTG ATG GAC GAC GGG AGC TTG GAA363 Arg Leu Thr Leu Tyr Lys Gly Asp Leu Met Asp Asp Gly Ser Leu Glu 6570 75 GAA GCC ATC AAG GGG TGC GAC GGC GTC GTC CAC ACC GCC TCT CCG GTC411 Glu Ala Ile Lys Gly Cys Asp Gly Val Val His Thr Ala Ser Pro Val 8085 90 ACC GAC GAT CCT GAG CAA ATG GTG GAG CCA GCG GTG ATC GGG ACG AAA459 Thr Asp Asp Pro Glu Gln Met Val Glu Pro Ala Val Ile Gly Thr Lys 95100 105 AAT GTG ATC GTC GCA GCG GCG GAG GCC AAG GTC CGG CGG GTT GTG TTC507 Asn Val Ile Val Ala Ala Ala Glu Ala Lys Val Arg Arg Val Val Phe 110115 120 ACC TCC TCC ATC GGT GCA GTC ACC ATG GAC CCC AAC CGG GCA GAC GTT555 Thr Ser Ser Ile Gly Ala Val Thr Met Asp Pro Asn Arg Ala Asp Val 125130 135 140 GTG GTG GAC GAG TCT TGT TGG AGC GAC CTC GAA TTT TGC AAG AGCACT 603 Val Val Asp Glu Ser Cys Trp Ser Asp Leu Glu Phe Cys Lys Ser Thr145 150 155 AAG AAC TGG TAT TGC TAC GGC AAG GCA GTG GCG GAG AAG GCC GCTTGG 651 Lys Asn Trp Tyr Cys Tyr Gly Lys Ala Val Ala Glu Lys Ala Ala Trp160 165 170 CCA GAG GGC AAG GAG AGA GGG GTT GAC CTC GTG GTG ATT AAC CCTGTG 699 Pro Glu Gly Lys Glu Arg Gly Val Asp Leu Val Val Ile Asn Pro Val175 180 185 CTC GTG CTT GGA CCG CTC CTT CAG TCG ACG ATC AAT GCG AGC ATCATC 747 Leu Val Leu Gly Pro Leu Leu Gln Ser Thr Ile Asn Ala Ser Ile Ile190 195 200 CAC ATC CTC AAG TAC TTG ACT GGC TCA GCC AAG ACC TAC GCC AACTCG 795 His Ile Leu Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Ser205 210 215 220 GTC CAG GCG TAC GTG CAC GTC AAG GAC GTC GCG CTT GCC CACGTC CTT 843 Val Gln Ala Tyr Val His Val Lys Asp Val Ala Leu Ala His ValLeu 225 230 235 GTC TTG GAG ACC CCA TCC GCC TCA GGC CGC TAT TTG TGC GCCGAG AGC 891 Val Leu Glu Thr Pro Ser Ala Ser Gly Arg Tyr Leu Cys Ala GluSer 240 245 250 GTC CTC CAC CGT GGC GAT GTG GTG GAA ATC CTT GCC AAG TTCTTC CCT 939 Val Leu His Arg Gly Asp Val Val Glu Ile Leu Ala Lys Phe PhePro 255 260 265 GAG TAT AAT GTA CCG ACC AAG TGC TCT GAT GAG GTG AAC CCAAGA GTA 987 Glu Tyr Asn Val Pro Thr Lys Cys Ser Asp Glu Val Asn Pro ArgVal 270 275 280 AAA CCA TAC AAG TTC TCC AAC CAG AAG CTG AGA GAC TTG GGGCTC GAG 1035 Lys Pro Tyr Lys Phe Ser Asn Gln Lys Leu Arg Asp Leu Gly LeuGlu 285 290 295 300 TTC ACC CCG GTG AAG CAG TGC CTG TAC GAA ACT GTC AAGAGC TTG CAG 1083 Phe Thr Pro Val Lys Gln Cys Leu Tyr Glu Thr Val Lys SerLeu Gln 305 310 315 GAG AAA GGC CAC CTA CCA GTC CCC TCC CCG CCG GAA GATTCG GTG CGT 1131 Glu Lys Gly His Leu Pro Val Pro Ser Pro Pro Glu Asp SerVal Arg 320 325 330 ATT CAG GGA TGATCTTAGA TCCATCACGG TGCGCATTTGTAATCCGGAG 1180 Ile Gln Gly 335 AAATGAGAGA AACATGTGGG AATTTGTTTGTACTTTTCTA AGTCAAACCT GGAGATAC 1240 ACCCTGAGTT CTGCATTGGA ATGGAAGTTGTCAATTGTTC CAAAAAAAAA AAAAAAA 1297 335 amino acids amino acid linearprotein unknown 14 Met Pro Val Asp Ala Leu Pro Gly Ser Gly Gln Thr ValCys Val Thr 1 5 10 15 Gly Ala Gly Gly Phe Ile Ala Ser Trp Ile Val LysLeu Leu Leu Glu 20 25 30 Arg Gly Tyr Thr Val Arg Gly Thr Val Arg Asn ProAsp Asp Pro Lys 35 40 45 Asn Gly His Leu Arg Asp Leu Glu Gly Ala Ser GluArg Leu Thr Leu 50 55 60 Tyr Lys Gly Asp Leu Met Asp Asp Gly Ser Leu GluGlu Ala Ile Lys 65 70 75 80 Gly Cys Asp Gly Val Val His Thr Ala Ser ProVal Thr Asp Asp Pro 85 90 95 Glu Gln Met Val Glu Pro Ala Val Ile Gly ThrLys Asn Val Ile Val 100 105 110 Ala Ala Ala Glu Ala Lys Val Arg Arg ValVal Phe Thr Ser Ser Ile 115 120 125 Gly Ala Val Thr Met Asp Pro Asn ArgAla Asp Val Val Val Asp Glu 130 135 140 Ser Cys Trp Ser Asp Leu Glu PheCys Lys Ser Thr Lys Asn Trp Tyr 145 150 155 160 Cys Tyr Gly Lys Ala ValAla Glu Lys Ala Ala Trp Pro Glu Gly Lys 165 170 175 Glu Arg Gly Val AspLeu Val Val Ile Asn Pro Val Leu Val Leu Gly 180 185 190 Pro Leu Leu GlnSer Thr Ile Asn Ala Ser Ile Ile His Ile Leu Lys 195 200 205 Tyr Leu ThrGly Ser Ala Lys Thr Tyr Ala Asn Ser Val Gln Ala Tyr 210 215 220 Val HisVal Lys Asp Val Ala Leu Ala His Val Leu Val Leu Glu Thr 225 230 235 240Pro Ser Ala Ser Gly Arg Tyr Leu Cys Ala Glu Ser Val Leu His Arg 245 250255 Gly Asp Val Val Glu Ile Leu Ala Lys Phe Phe Pro Glu Tyr Asn Val 260265 270 Pro Thr Lys Cys Ser Asp Glu Val Asn Pro Arg Val Lys Pro Tyr Lys275 280 285 Phe Ser Asn Gln Lys Leu Arg Asp Leu Gly Leu Glu Phe Thr ProVal 290 295 300 Lys Gln Cys Leu Tyr Glu Thr Val Lys Ser Leu Gln Glu LysGly His 305 310 315 320 Leu Pro Val Pro Ser Pro Pro Glu Asp Ser Val ArgIle Gln Gly 325 330 335 7 amino acids amino acid single linear peptideunknown 15 Asn Trp Tyr Cys Tyr Gly Lys 1 5 13 amino acids amino acidsingle linear peptide unknown Modified-site /label= Xaa /note= “anyamino acid” 16 His Leu Pro Val Pro Xaa Pro Pro Glu Asp Ser Val Arg 1 510 13 amino acids amino acid single linear peptide unknown 17 Thr TyrAla Asn Ser Val Gln Ala Tyr Val His Val Lys 1 5 10 15 amino acids aminoacid single linear peptide unknown 18 Gly Cys Asp Gly Val Val His ThrAla Ser Pro Val Thr Asp Asp 1 5 10 15 12 amino acids amino acid singlelinear peptide unknown 19 Leu Arg Asp Leu Gly Leu Glu Phe Thr Pro ValLys 1 5 10 14 amino acids amino acid single linear peptide unknown 20Gly Asp Leu Met Asp Tyr Gly Ser Leu Glu Glu Ala Ile Lys 1 5 10 8 aminoacids amino acid single linear peptide unknown 21 Lys Asn Trp Tyr CysTyr Gly Lys 1 5 22 base pairs nucleic acid both linear cDNA unknown 22AARAAYTGGT AYTGYTAYGG AA 22 16 amino acids amino acid single linearpeptide unknown 23 Lys Gly Cys Asp Gly Val Val His Thr Ala Ser Pro ValThr Asp As 1 5 10 15 19 base pairs nucleic acid both linear cDNA unknown24 AARGGTGYGA YGGGTGTCA 19 13 amino acids amino acid single linearpeptide unknown 25 Lys Leu Arg Asp Leu Gly Leu Glu Phe Thr Pro Val Lys 15 10 14 base pairs nucleic acid both linear cDNA unknown 26 GARTTYACCCGTAA 14 15 amino acids amino acid single linear peptide unknown 27 LysGly Asp Leu Met Asp Tyr Gly Ser Leu Glu Glu Ala Ile Lys 1 5 10 15 21base pairs nucleic acid both linear cDNA unknown 28 AARGGGAYYTATGGAYTAYG G 21 26 base pairs nucleic acid both linear cDNA unknown 29GGCAATCCCC ATATGCCCGT CGACGC 26

What is claimed is:
 1. A method of producing transgenic plants withinwhich the biosynthesis of lignins is regulated either in the sense of anincrease, or in the sense of a reduction of the lignin levels produced,relative to the normal lignin levels produced in plants comprising:transforming plant cells with a recombinant nucleotidc sequencecomprising one or more coding regions, wherein said coding regions areselected from the group consisting of: the nuclcotide sequencerepresented by SEQ ID NO: 1, coding for a mRNA, said mRNA coding for theCinnamoyl CoA Reductase (CCR) of lucerne represented by SEQ ID NO: 2;and the nucleotide sequence represented by SEQ ID NO: 3, coding for amRNA, said mRNA coding for the CCR of maize represented by SEQ ID NO: 4,and the nucleotide sequence fully complementary to that represented bySEQ ID NO: 1, or SEQ ID NO:
 3. 2. An isolated DNA sequence comprising:the nuclcotide sequence represented by SEQ ID NO: 1, coding for a mRNA,said mRNA coding for the Cinnamoyl CoA Reductase (CCR) represented bySEQ ID NO: 2, or the nucleotide sequence represented by SEQ ID NO: 3,coding for a mRNA, said mRNA coding for a protein represented by SEQ IDNO:
 4. 3. An isolated DNA sequence comprising: the nucleotide sequencefully complementary to that represented by SEQ ID NO: 1 or thenucleotide sequence fully complementary to that represented by SEQ IDNO:
 3. 4. An isolated mRNA selected from the mRNA coded by the DNAsequence represented by SEQ ID NO: 1 and the mRNA coded by the DNAsequence represented by SEQ ID NO:
 3. 5. An anti sense mRNA comprisingnucleotides fully complementary to a mRNA according to claim
 4. 6. Anisolated nucleotide sequence coding for the CCRs represented by SEQ IDNO: 2 or SEQ ID NO:
 4. 7. An isolated complex formed between an antisense mRNA according to claim 5, and a CCR mRNA.
 8. An isolatedrecombinant nucleotide sequence comprising at least one DNA sequenceaccording to claim 2, said sequence being inserted in a heterologoussequence.
 9. An isolated recombinant nucleotide sequence, comprising atleast one fully complementary DNA sequence according to claim 3,inserted in a heterologous sequence.
 10. A process for the regulation ofthe biosynthesis of lignins in a plant, either by reducing, or byincreasing the levels of lignin produced, relative to the normal levelsof lignins produced in said plant, said process comprising the step oftransforming cells of said plant using a vector containing a nucleotidesequence according to claim
 2. 11. A plant, plant fragment, cell, fruit,seed, or pollen, transformed by incorporation of at least one nucleotidesequence selected from the group consisting of: the nucleotide sequencerepresented by SEQ ID NO: 1, coding for a mRNA, said mRNA coding for theCinnamoyl CoA Reductase (CCR) of lucerne represented by SEQ ID NO: 2,the nucleotide sequence represented by SEQ ID NO: 3, coding for a mRNA,said mRNA coding for the CCR of maize represented by SEQ ID NO: 4, andthe nucleotide sequence fully complementary to that represented by SEQID NO: 1 and SEQ ID NO:
 3. 12. An isolated recombinant nucleotidesequence according to claim 9 wherein said at least one fullycomplementary DNA sequence is operatively linked to at least one of apromoter or a terminator.
 13. An isolated recombinant vector comprisinga recombinant nucleotide sequence according to claim
 12. 14. A processfor reducing the biosynthesis of lignins in plants, relative to thenormal levels of lignins produced in these plants, said processcomprising transforming cells of said plants with a vector containing anucleotide sequence according to claim 3.